python怎么提取pdf文檔中的表格數據

發布時間：2021-07-22 09:50:36 來源：億速云閱讀：333 作者：chen 欄目：大數據

本篇內容介紹了“python怎么提取pdf文檔中的表格數據”的有關知識，在實際案例的操作過程中，不少人都會遇到這樣的困境，接下來就讓小編帶領大家學習一下如何處理這些情況吧！希望大家仔細閱讀，能夠學有所成！

實現提取pdf文檔中的表格數據需要使用camelot模塊

這個模塊可以直接使用pip進行安裝

pip install "camelot-py[cv]"

用到的pdf示例文件可以直接在原文鏈接處下載

http://gstcouncil.gov.in/sites/default/files/gst-revenue-collection-march3020.pdf

第一步是讀入pdf文件

import camelot
tables = camelot.read_pdf('gst-revenue-collection-march3020.pdf', flavor='stream', pages='0-3')

這里flavor參數的作用暫時還不知道

如果表格跨頁需要指定pages參數

tables
tables[2]
tables[2].df

tables可以返回解析獲得的表格數量

tables[2]獲取指定的表格

tables[2].df將表格數據轉換成數據框

pandas 中兩個數據框按照行合并需要用到append（）方法

aa = {"A":[1,2,3],"B":[4,5,6]}
bb = {"A":[4],"B":[7]}
import pandas as pd
a = pd.DataFrame(aa)
b = pd.DataFrame(bb)
a.append(b)

https://www.tutorialexample.com/a-simple-guide-to-python-convert-svg-to-pdf-with-svglib-python-tutorial/

實現這個功能需要使用到的是svglib這個庫，直接使用pip安裝

pip install svglib

svg轉換為pdf格式代碼

from svglib.svglib import svg2rlg
from reportlab.graphics import renderPDF
drawing = svg2rlg("home.svg")
renderPDF.drawToFile(drawing, "file.pdf")

“python怎么提取pdf文檔中的表格數據”的內容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業相關的知識可以關注億速云網站，小編將為大家輸出更多高質量的實用文章！

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本