TensorFlow內存管理bfc算法的示例分析

發布時間：2021-08-23 10:27:11 來源：億速云閱讀：198 作者：小新欄目：開發技術

這篇文章主要介紹TensorFlow內存管理bfc算法的示例分析，文中介紹的非常詳細，具有一定的參考價值，感興趣的小伙伴們一定要看完！

1. 基本介紹

tensorflow設備內存管理模塊實現了一個best-fit with coalescing算法（后文簡稱bfc算法）。

bfc算法是Doung Lea's malloc(dlmalloc)的一個非常簡單的版本。

它具有內存分配、釋放、碎片管理等基本功能。

2. bfc基本算法思想

1. 數據結構

整個內存空間由一個按基址升序排列的Chunk雙向鏈表來表示，它們的直接前趨和后繼必須在地址連續的內存空間。Chunk結構體里含有實際大小、請求大小、是否被占用、基址、直接前趨、直接后繼、Bin索引等信息。

TensorFlow內存管理bfc算法的示例分析

2. 申請

用戶申請一個內存塊（malloc)。根據chunk雙鏈表找到一個合適的內存塊，如果該內存塊的大小是用戶申請的大小的二倍以上，那么就將該內存塊切分成兩塊，這就是split操作。

返回其中一塊給用戶，并將該內存塊標識為占用

Spilt操作會新增一個chunk，所以需要修改chunk雙鏈表以維持前驅和后繼關系

如果用戶申請512的空間，正好有一塊1024的chunk2是空閑的，由于1024/512 =2，所以chunk2 被split為2塊：chunk2_1和chunk2_2。返回chunk2_1給用戶并將其標志位占用狀態。

3. 釋放

用戶釋放一個內存塊（free）。先將該塊標記為空閑。然后根據chunk數據結構中的信息找到其前驅和后繼內存塊。如果前驅和后繼塊中有空閑的塊，那么將剛釋放的塊和空閑的塊合并成一個更大的chunk（這就是merge操作，合并當前塊和其前后的空閑塊）。再修改雙鏈表結構以維持前驅后繼關系。這就做到了內存碎片的回收。

如果用戶要free chunk3，由于chunk3的前驅chunk2也是空閑的，所以將chunk2和chunk3合并得到一個新的chunk2'，大小為chunk2和chunk3之和。

3. bins

1. bins數據結構

bfc算法采取的是被動分塊的策略。最開始整個內存是一個chunk，隨著用戶申請空間的次數增加，最開始的大chunk會被不斷的split開來，從而產生越來越多的小chunk。當chunk數量很大時，為了尋找一個合適的內存塊而遍歷雙鏈表無疑是一筆巨大的開銷。為了實現對空閑塊的高效管理，bfc算法設計了bin這個抽象數據結構。

每個bin都有一個size屬性，一個bin是一個擁有chunk size >= binsize的空閑chunk的集合。集合中的chunk按照chunk size的升序組織成單鏈表。bfc算法維護了一個bin的集合：bins。它由多個bin以及從屬于每個bin的chunks組成。內存中所有的空閑chunk都由bins管理。

TensorFlow內存管理bfc算法的示例分析

圖中每一列表示一個bin，列首方格中的數字表示bin的size。bin size的大小都是256的2^n的倍。每個bin下面掛載了一系列的空閑chunk，每個chunk的chunk size都大于等于所屬的bin的bin size，按照chunk size的升序掛載成單鏈表。

2. bins操作

bfc算法針對bins這個集合設計了三個操作：search、insert、delete。

search

給定一個chunk size，從bins中找到大于等于該chunksize的最小的那個空閑chunk。Search操作具體流程如下。如果bin以數組的形式組織，那么可以從index = chunk size /256 >>2的那個bin開始查找。最好的情況是開始查找的那個bin的chunk鏈表非空，那么直接返回鏈表頭即可。這種情況時間復雜度是常數級的。最壞的情況是遍歷bins數組中所有的bin。對于一般大小的內存來說，bins數組元素非常少，比如4G空間只需要23個bin就足夠了（256 * 2 ^ 23 > 4G），因此也很快能返回結果。總體來說search操作是非常高效的。對于固定大小內存來說，查找時間是常數量級的。

insert

將一個空閑的chunk插入到一個bin所掛載的chunk鏈表中，同時需要維持chunk鏈表的升序關系。具體流程是直接將chunk插入到index = chunk size /256 >>2的那個bin中即可。

delete

將一個空閑的chunk從bins中移除。

4. 總結

將內存分塊管理，按塊進行空間分配和釋放。

通過split操作將大內存塊分解成用戶需要的小內存塊。

通過merge操作合并小的內存塊，做到內存碎片回收

通過bin這個抽象數據結構實現對空閑塊高效管理。

5. 代碼分析

1. 代碼地址

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime

2. 數據結構

Chunk

static const int kInvalidChunkHandle = -1;
...
struct Chunk {
  size_t size = 0; // Full size of buffer.

  // We sometimes give chunks that are larger than needed to reduce
  // fragmentation. requested_size keeps track of what the client
  // actually wanted so we can understand whether our splitting
  // strategy is efficient.
  size_t requested_size = 0;

  // allocation_id is set to -1 when the chunk is not in use. It is assigned a
  // value greater than zero before the chunk is returned from
  // AllocateRaw, and this value is unique among values assigned by
  // the parent allocator.
  int64 allocation_id = -1;
  void* ptr = nullptr; // pointer to granted subbuffer.

  // If not kInvalidChunkHandle, the memory referred to by 'prev' is directly
  // preceding the memory used by this chunk. E.g., It should start
  // at 'ptr - prev->size'
  ChunkHandle prev = kInvalidChunkHandle;

  // If not kInvalidChunkHandle, the memory referred to by 'next' is directly
  // following the memory used by this chunk. E.g., It should be at
  // 'ptr + size'
  ChunkHandle next = kInvalidChunkHandle;

  // What bin are we in?
  BinNum bin_num = kInvalidBinNum;

  bool in_use() const { return allocation_id != -1; }
};

Bin

// A Bin is a collection of similar-sized free chunks.
struct Bin {
  // All chunks in this bin have >= bin_size memory.
  size_t bin_size = 0;

  struct ChunkComparator {
    explicit ChunkComparator(BFCAllocator* allocator)
      : allocator_(allocator) {}
    // Sort first by size and then use pointer address as a tie breaker.
    bool operator()(const ChunkHandle ha,
            const ChunkHandle hb) const NO_THREAD_SAFETY_ANALYSIS {
      const Chunk* a = allocator_->ChunkFromHandle(ha);
      const Chunk* b = allocator_->ChunkFromHandle(hb);
      if (a->size != b->size) {
        return a->size < b->size;
      }
      return a->ptr < b->ptr;
    }

    private:
      BFCAllocator* allocator_; // The parent allocator
  };

  typedef std::set<ChunkHandle, ChunkComparator> FreeChunkSet;
  // List of free chunks within the bin, sorted by chunk size.
  // Chunk * not owned.
  FreeChunkSet free_chunks;
  Bin(BFCAllocator* allocator, size_t bs)
    : bin_size(bs), free_chunks(ChunkComparator(allocator)) {}
};

AllocationRegion

AllocationRegion給一個連續的內存區域做指針到ChunkHandle的映射。

RegionManager

RegionManager聚集了一個或多個AllocationRegion，并提供一個從指針到基礎ChunkHandle的間接層，這個間接層可在多個不連續的內存區域進行分配。

3. 分配大小

將每次分配的內存大小調整為kMinAllocationSize的N倍，這樣所有內存地址都是很好地按字節對齊了。

// kMinAllocationSize = 256
static const size_t kMinAllocationBits = 8;
static const size_t kMinAllocationSize = 1 << kMinAllocationBits;
...
size_t BFCAllocator::RoundedBytes(size_t bytes) {
  size_t rounded_bytes =
    (kMinAllocationSize *
    ((bytes + kMinAllocationSize - 1) / kMinAllocationSize));
  DCHECK_EQ(size_t{0}, rounded_bytes % kMinAllocationSize);
  return rounded_bytes;
}

4. 初始化bin

typedef int BinNum;
static const int kInvalidBinNum = -1;
static const int kNumBins = 21;
...
// 二進制2^8往左移0，1，2位
// (static_cast<size_t>(256) << 0) = 256
// (static_cast<size_t>(256) << 1) = 512
// (static_cast<size_t>(256) << 2) = 1024
size_t BinNumToSize(BinNum index) {
  return static_cast<size_t>(256) << index;
}
...
char bins_space_[sizeof(Bin) * kNumBins];
// Map from bin size to Bin
Bin* BinFromIndex(BinNum index) {
  return reinterpret_cast<Bin*>(&(bins_space_[index * sizeof(Bin)]));
}
...
// We create bins to fit all possible ranges that cover the
// memory_limit_ starting from allocations up to 256 bytes to
// allocations up to (and including) the memory limit.
for (BinNum b = 0; b < kNumBins; b++) {
  size_t bin_size = BinNumToSize(b);
  VLOG(1) << "Creating bin of max chunk size "
      << strings::HumanReadableNumBytes(bin_size);
  new (BinFromIndex(b)) Bin(this, bin_size);
  CHECK_EQ(BinForSize(bin_size), BinFromIndex(b));
  CHECK_EQ(BinForSize(bin_size + 255), BinFromIndex(b));
  CHECK_EQ(BinForSize(bin_size * 2 - 1), BinFromIndex(b));
  if (b + 1 < kNumBins) {
    CHECK_NE(BinForSize(bin_size * 2), BinFromIndex(b));
  }
}

5. 查找bin

// 求屬于第幾個bin
BinNum BinNumForSize(size_t bytes) {
  uint64 v = std::max<size_t>(bytes, 256) >> kMinAllocationBits;
  int b = std::min(kNumBins - 1, Log2FloorNonZero(v));
  return b;
}
// 最高位非零的二進制位數，eg: 0001 0101B 為5
inline int Log2FloorNonZero(uint64 n) {
#if defined(__GNUC__)
  return 63 ^ __builtin_clzll(n);
#elif defined(PLATFORM_WINDOWS)
  unsigned long index;
  _BitScanReverse64(&index, n);
  return index;
#else
  int r = 0;
  while (n > 0) {
    r++;
    n >>= 1;
  }
  return r;
#endif
}

6. 查找Chunk

// 先加鎖
mutex_lock l(lock_);
void* ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes);
if (ptr != nullptr) {
  return ptr;
}
// FindChunkPtr函數內部
void* BFCAllocator::FindChunkPtr(BinNum bin_num, size_t rounded_bytes,
                 size_t num_bytes) {
  // First identify the first bin that could satisfy rounded_bytes.
  for (; bin_num < kNumBins; bin_num++) {
    // Start searching from the first bin for the smallest chunk that fits
    // rounded_bytes.
    Bin* b = BinFromIndex(bin_num);
    for (auto citer = b->free_chunks.begin(); citer != b->free_chunks.end();
        ++citer) {
      // 從之前得到的Bin索引開始，查找合適的空閑Chunk：
      const BFCAllocator::ChunkHandle h = (*citer);
      BFCAllocator::Chunk* chunk = ChunkFromHandle(h);
      DCHECK(!chunk->in_use());
      if (chunk->size >= rounded_bytes) {
        // We found an existing chunk that fits us that wasn't in use, so remove
        // it from the free bin structure prior to using.
        RemoveFreeChunkIterFromBin(&b->free_chunks, citer);

        // If we can break the size of the chunk into two reasonably
        // large pieces, do so.
        //
        // TODO(vrv): What should be the criteria when deciding when
        // to split? 
        // 具體實現后面會分析
        if (chunk->size >= rounded_bytes * 2) {
          SplitChunk(h, rounded_bytes);
          chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
        }

        // The requested size of the returned chunk is what the user
        // has allocated.
        chunk->requested_size = num_bytes;
        // Assign a unique id and increment the id counter, marking the
        // chunk as being in use.
        chunk->allocation_id = next_allocation_id_++;

        // Update stats.
        ++stats_.num_allocs;
        stats_.bytes_in_use += chunk->size;
        stats_.max_bytes_in_use =
          std::max(stats_.max_bytes_in_use, stats_.bytes_in_use);
        stats_.max_alloc_size =
          std::max<std::size_t>(stats_.max_alloc_size, chunk->size);

        VLOG(4) << "Returning: " << chunk->ptr;
        if (VLOG_IS_ON(4)) {
          LOG(INFO) << "A: " << RenderOccupancy();
        }
        return chunk->ptr;
      }
    }
  }
  return nullptr;
}

7. 拆分Chunk

如果Chunk的大小大于等于申請內存大小的2倍，那么將該Chunk拆分成2個：第一個Chunk的大小等于申請內存大小，第二個Chunk作為它的直接后繼。

if (chunk->size >= rounded_bytes * 2) {
  SplitChunk(h, rounded_bytes);
  chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
}

void BFCAllocator::SplitChunk(BFCAllocator::ChunkHandle h, size_t num_bytes) {
  // Allocate the new chunk before we do any ChunkFromHandle
  ChunkHandle h_new_chunk = AllocateChunk();

  Chunk* c = ChunkFromHandle(h);
  CHECK(!c->in_use() && (c->bin_num == kInvalidBinNum));

  // Create a new chunk starting num_bytes after c
  BFCAllocator::Chunk* new_chunk = ChunkFromHandle(h_new_chunk);
  new_chunk->ptr = static_cast<void*>(static_cast<char*>(c->ptr) + num_bytes);
  region_manager_.set_handle(new_chunk->ptr, h_new_chunk);

  // Set the new sizes of the chunks.
  new_chunk->size = c->size - num_bytes;
  c->size = num_bytes;

  // The new chunk is not in use.
  new_chunk->allocation_id = -1;

  // Maintain the pointers.
  // c <-> c_neighbor becomes
  // c <-> new_chunk <-> c_neighbor
  BFCAllocator::ChunkHandle h_neighbor = c->next;
  new_chunk->prev = h;
  new_chunk->next = h_neighbor;
  c->next = h_new_chunk;
  if (h_neighbor != kInvalidChunkHandle) {
    Chunk* c_neighbor = ChunkFromHandle(h_neighbor);
    c_neighbor->prev = h_new_chunk;
  }

  // Add the newly free chunk to the free bin.
  InsertFreeChunkIntoBin(h_new_chunk);
}

8. 回收chunk

加鎖，獲得ChunkHandle

mutex_lock l(lock_);
BFCAllocator::ChunkHandle h = region_manager_.get_handle(ptr);
FreeAndMaybeCoalesce(h);

FreeAndMaybeCoalesce

void BFCAllocator::FreeAndMaybeCoalesce(BFCAllocator::ChunkHandle h) {
  Chunk* c = ChunkFromHandle(h);
  CHECK(c->in_use() && (c->bin_num == kInvalidBinNum));

  // Mark the chunk as no longer in use
  c->allocation_id = -1;

  // Updates the stats.
  stats_.bytes_in_use -= c->size;

  // This chunk is no longer in-use, consider coalescing the chunk
  // with adjacent chunks.
  ChunkHandle chunk_to_reassign = h;

  // If the next chunk is free, coalesce the two
  if (c->next != kInvalidChunkHandle) {
    Chunk* cnext = ChunkFromHandle(c->next);
    if (!cnext->in_use()) {
    //   VLOG(8) << "Chunk at " << cnext->ptr << " merging with c " <<
    //   c->ptr;

    chunk_to_reassign = h;

    // Deletes c->next
    RemoveFreeChunkFromBin(c->next);
    Merge(h, ChunkFromHandle(h)->next);
    }
  }

  // If the previous chunk is free, coalesce the two
  c = ChunkFromHandle(h);
  if (c->prev != kInvalidChunkHandle) {
    Chunk* cprev = ChunkFromHandle(c->prev);
    if (!cprev->in_use()) {
    //   VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev "
    //    << cprev->ptr;

    chunk_to_reassign = c->prev;

    // Deletes c
    RemoveFreeChunkFromBin(c->prev);
    Merge(ChunkFromHandle(h)->prev, h);
    c = ChunkFromHandle(h);
    }
  }

  InsertFreeChunkIntoBin(chunk_to_reassign);
}

Merge

// Merges h2 and h3 when Chunk(h2)->next is h3 and Chunk(h3)->prev is c1.
// We merge Chunk(h3) into Chunk(h2).
void BFCAllocator::Merge(BFCAllocator::ChunkHandle h2,
             BFCAllocator::ChunkHandle h3) {
  Chunk* c1 = ChunkFromHandle(h2);
  Chunk* c2 = ChunkFromHandle(h3);
  // We can only merge chunks that are not in use.
  CHECK(!c1->in_use() && !c2->in_use());

  // c1's prev doesn't change, still points to the same ptr, and is
  // still not in use.

  // Fix up neighbor pointers
  //
  // c1 <-> c2 <-> c3 should become
  // c1 <-> c3

  BFCAllocator::ChunkHandle h4 = c2->next;
  c1->next = h4;
  CHECK(c2->prev == h2);
  if (h4 != kInvalidChunkHandle) {
    BFCAllocator::Chunk* c3 = ChunkFromHandle(h4);
    c3->prev = h2;
  }

  // Set the new size
  c1->size += c2->size;

  DeleteChunk(h3);
}

以上是“TensorFlow內存管理bfc算法的示例分析”這篇文章的所有內容，感謝各位的閱讀！希望分享的內容對大家有幫助，更多相關知識，歡迎關注億速云行業資訊頻道！

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

TensorFlow內存管理bfc算法的示例分析

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

TensorFlow內存管理bfc算法的示例分析

猜你喜歡

最新資訊

相關推薦

相關標簽