疎行列

疎行列の例 $\left({\begin{smallmatrix}11&22&0&0&0&0&0\\0&33&44&0&0&0&0\\0&0&55&66&77&0&0\\0&0&0&0&0&88&0\\0&0&0&0&0&0&99\\\end{smallmatrix}}\right)$ 上の疎行列には非ゼロ要素が9個しかなく、ゼロ要素は26個ある。スパース性は74%であり、密度は26%である。

数値解析と計算科学の分野において、疎行列（そぎょうれつ、英語: sparse matrix）または疎配列（英語: sparse array）とは、成分のほとんどが零である行列のことをいう^[1]。スパース行列とも言う。行列が疎であると判定するためのゼロの値を持つ要素の割合について厳密な定義はないが、一般的な条件としては、非ゼロ要素の数が行数または列数におおよそ近いものである。逆に、ほとんどの要素が非ゼロ要素である行列は、密な（dense）行列であると見なされる^[1]。行列のゼロ要素の数を要素数の合計で割った値を、行列のスパース性（sparsity）と呼ぶことがある。

概念的には、スパース性はペアワイズ相互作用をほとんど持たないシステムに対応する。たとえば、隣同士がバネで接続されたボールの線について考えると、各ボールは隣接するボールのみと組になっているため、これはスパースなシステムである。対称的に、同じボールの線でも、1つのボールが他のすべてのボールとバネでつながっている場合、このシステムは密行列と対応する。スパース性の概念は、組み合せ論や、通常、重要なデータや接続の密度が低くなるネットワーク理論・数値解析などの応用領域で役に立つ。巨大な疎行列は、偏微分方程式を解くときに科学や工学のアプリケーションによく現れる。

コンピューター上で疎行列の保存や操作を行うときには、行列のスパースな構造を利用した特別なアルゴリズムとデータ構造を使用することが有益であり、多くの場合には必要になる。機械学習の分野では疎行列がよく用いられるため^[2]、疎行列に特化したコンピューターも作られている^[3]。標準的な密行列の構造とアルゴリズムを対象とする操作は、巨大な疎行列に適用する場合には処理とメモリがゼロ値で無駄になり、遅くて非効率である。スパースなデータは本質的により簡単に圧縮されるため、必要なストレージが非常に小さくなる。非常に巨大な疎行列に対しては、標準的な密行列で使用する操作を適用することができる場合もある。

有限差分法、ある有限体積法、有限要素法などで離散化された偏微分方程式は、一般に疎行列を係数行列とした連立一次方程式となる。

数値解析の分野では、疎行列を前提とした解法が多い。疎行列の非零要素だけを工夫してうまく格納することにより、大次元の問題を扱うことが容易になる。また、たとえば比較的少ない手間でベクトルと行列の積を計算できるなどの利点がある。ランダムメモリアクセスを多用する疎行列を用いた計算処理はベクトルプロセッサが得意としており、一般的なスカラ型CPUやGPGPUでは未だに苦手とする処理である^[4]^{[注釈 1]}。

格納形式

行列は、典型的には2次元の配列に格納される。配列の各要素は、行列の要素 $a i, j$ を表し、2つのインデックス $i$ と $j$ を用いてアクセスされる。慣習として、 $i$ は上から下に数えた行のインデックスを指し、 $j$ は左から右に数えた列のインデックスを指す。 $m \times n$ 行列の場合、このフォーマットで行列を格納するのに必要なメモリ量は、 $m \times n$ に比例する（忘れられることが多いが、行列の次元も格納する必要がある）。

疎行列の場合、非ゼロ要素のみを保存することで、必要メモリ容量の大幅な削減が実現できる。非ゼロ要素の数と分散によって、異なるデータ構造を利用することで、基本的なアプローチに比べてメモリ量の大幅な節約が可能になる。トレードオフは、各要素へのアクセスがより複雑になり、オリジナルの行列を曖昧さなく復元できるようにするために追加の構造が必要になることである。

このため疎行列を格納するための様々な形式（フォーマット）が存在する。

フォーマットは2つのグループに分けられる。

効率的な編集をサポートするフォーマット
- DOK（Dictionary of keys）
- LIL（List of lists）
- COO
効率的なアクセスと行列操作をサポートするフォーマット
- CSR
- CSC
- BSR: ブロック疎行列（Block Sparse matrix）向け

以下の名称は、Netlibで使われているもの^[5]やIntel Math Kernel Library^[6]、SciPyで使われているものに基づく。例として次の疎行列Aを考える。

${\begin{bmatrix}1&2&3&0\\0&0&0&1\\2&0&0&2\\0&0&0&1\\\end{bmatrix}}$

Dictionary of Key

Dictionary of Key（DOK）は、(行, 列) をキーにして連想配列に入れる方式である。

リストのリスト

リストのリスト（英: List of lists, LIL）は、行ごとにリストを作り、そのリストの中に (列, 値) のタプルを入れる方式である。

座標

座標（英: Coordinate, COO）形式は [値, 行インデックス, 列インデックス] タプルの集合で行列を表現する方式である^[7]。

行列Aの要素を座標（インデックス）とともに並べると次のようになる。

A  = [1 2 3 0 0 0 0 1 2 0 0 2 0 0 0 1] # 値
IA = [1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4] # 行インデックス
JA = [1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4] # 列インデックス

ここで「存在しない値をゼロ要素とする」と定めるとゼロ要素をすべて削除できる。これにより得られる、

A  = [1 2 3 1 2 2 1] # 値
IA = [1 1 1 2 3 3 4] # 行インデックス
JA = [1 2 3 4 1 4 4] # 列インデックス

が疎行列AのCOO形式による表現である。

COO行列のゼロ要素を非ゼロに編集したい場合、後ろに非ゼロタプルを追加するだけでよいため編集効率が良い。

圧縮行格納

圧縮行格納（英: Compressed Row Storage, CRS）形式は行インデックス配列を圧縮する方式である。別名は Compressed Sparse Row（CSR）形式^[8]。

CSR方式ではまず2次元の行列を行方向に並べる。次に「存在しない値をゼロ要素とする」と定め、ゼロ要素をすべて削除する。この段階で行・列インデックスとともに並べると次のようになる。

data = [1 2 3 1 2 2 1] # 値
IA   = [1 1 1 2 3 3 4] # 行インデックス
JA   = [1 2 3 4 1 4 4] # 列インデックス

ここで行インデックス配列（IA）に着目する。現在は各要素が明示的に行インデックスを持っているが、行の切れ目さえわかっていれば、これは自動的に導ける。例えば IA[1] = IA[2] = IA[3] = 1であるが、「1行目は1要素目から、2行目は4要素目から」とわかっていれば、IA[1:4]=[1 1 1]を即座に導ける。これはCSR方式が行ごとに並べたうえでゼロ要素を削除する規則に由来している。

この行インデックス表現は行headポインタの配列と見なせる。すなわちindptr = [ptr_row_1 ptr_row2 ...]である。インデックスを直接示す配列は列インデックス配列JAのみになったので、これをindicesと改名する^[9]。これにより得られる、

data    = [1 2 3 1 2 2 1] # 値
indices = [1 2 3 4 1 4 4] # 列インデックス
indptr  = [1 4 5 7]       # 行Headポインタ

が疎行列AのCSR形式による表現である。

CSR形式は行へのアクセスに優れている^[10]。1行目にアクセスする場合、データを data[indptr[1]:indptr[2]] で取得し列インデックスを indices[indptr[1]:indptr[2]] で取得できる（行インデックスは当然1）^[9]。対照的にCOO形式であればまず行インデックス配列IAを全長走査しIA[k] == 1に該当する要素番号kをリストアップし、そのうえでdata[k], indices[k]をによるアクセスを全kに関しておこなう必要がある。

対照的に、CSR形式は列へのアクセスに劣る^[11]。1列目にアクセスする場合、indicesを全長走査しindices[k] == 1に該当する要素番号kをリストアップしたのち、行インデックスを得るためにindptrを走査して各kに大してindptr[n] <= k <indptr[n+1]を満たすnを見つける必要がある。

圧縮列格納

圧縮列格納（英: Compressed Column Storage, CCS）形式はCRSを列単位にしたものである。別名は Compressed Sparse Column (CSC) 形式^[12]。

圧縮対角格納

圧縮対角格納（Compressed Diagonal Storage、CDS）形式やDiagonal（DIA）は、CRS・CSR を対角行列単位にしたものである。

スカイライン格納方式（SKS、SKY）

三角行列のために用いられる。

ブロック圧縮行格納

ブロック圧縮行格納（英: Block Compressed Row Storage, BCRS）形式はCRSをブロック単位にしたものである。別名は Block Sparse Row (BSR) 形式^[13]。

外部リンク

脚注

[脚注の使い方]

注釈

^ 疎行列にアクセスする際に行われる、巨大な配列データを大域的にインデックス参照で引いてくるランダムメモリアクセスを多用する操作は、一般的なスカラ型のCPUやGPGPUにとっては苦手な処理である。

出典

^ ^a ^b Yan, Di; Wu, Tao; Liu, Ying; Gao, Yang (2017). An efficient sparse-dense matrix multiplication on a multicore system. IEEE. doi:10.1109/icct.2017.8359956. ISBN 978-1-5090-3944-9. The computation kernel of DNN is large sparse-dense matrix multiplication. In the field of numerical analysis, a sparse matrix is a matrix populated primarily with zeros as elements of the table. By contrast, if the number of non-zero elements in a matrix is relatively large, then it is commonly considered a dense matrix. The fraction of zero elements (non-zero elements) in a matrix is called the sparsity (density). Operations using standard dense-matrix structures and algorithms are relatively slow and consume large amounts of memory when applied to large sparse matrices.
^ "Argonne National Laboratory Deploys Cerebras CS-1, the World's Fastest Artificial Intelligence Computer | Argonne National Laboratory". www.anl.gov (Press release) (英語). 2019年12月2日閲覧。The WSE is the largest chip ever made at 46,225 square millimeters in area, it is 56.7 times larger than the largest graphics processing unit. It contains 78 times more AI optimized compute cores, 3,000 times more high speed, on-chip memory, 10,000 times more memory bandwidth, and 33,000 times more communication bandwidth.
^ “Cerebras Systems Unveils the Industry's First Trillion Transistor Chip” (英語). www.businesswire.com (2019年8月19日). 2019年12月2日閲覧。 “The WSE contains 400,000 AI-optimized compute cores. Called SLAC™ for Sparse Linear Algebra Cores, the compute cores are flexible, programmable, and optimized for the sparse linear algebra that underpins all neural network computation”
^ “プロセッサ開発のセンス～第4回ベクトル・プロセッサ～ | 株式会社エヌエスアイテクス（NSITEXE,Inc.）” (2023年2月22日). 2023年6月18日閲覧。
^ Survey of Sparse Matrix Storage Formats
^ Intel® MKL Sparse BLAS Overview | Intel® Developer Zone
^ "scipy.sparse.coo_matrix ... A sparse matrix in COOrdinate format." scipy.sparse.coo_matrix. scipy. 2022-03-05閲覧.
^ "scipy.sparse.csr_matrix ... Compressed Sparse Row matrix" scipy.sparse.csr_matrix. scipy. 2022-03-05閲覧.
^ ^a ^b "csr_matrix((data, indices, indptr) ... is the standard CSR representation where the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]." scipy.sparse.csr_matrix. scipy. 2022-03-05閲覧.
^ "Advantages of the CSR format ... efficient row slicing" scipy.sparse.csr_matrix. scipy. 2022-03-05閲覧.
^ "Disadvantages of the CSR format slow column slicing operations" scipy.sparse.csr_matrix. scipy. 2022-03-05閲覧.
^ "scipy.sparse.csc_matrix ... Compressed Sparse Column matrix" scipy.sparse.csc_matrix. scipy. 2022-03-05閲覧.
^ "scipy.sparse.bsr_matrix ... Block Sparse Row matrix" scipy.sparse.bsr_matrix. 2022-03-05閲覧.

[5] 疎行列にアクセスする際に行われる、巨大な配列データを大域的にインデックス参照で引いてくるランダムメモリアクセスを多用する操作は、一般的なスカラ型のCPUやGPGPUにとっては苦手な処理である。

[Yan_Wu_Liu_Gao_2017_p.-1] Yan, Di; Wu, Tao; Liu, Ying; Gao, Yang (2017). An efficient sparse-dense matrix multiplication on a multicore system. IEEE. doi:10.1109/icct.2017.8359956. ISBN 978-1-5090-3944-9. The computation kernel of DNN is large sparse-dense matrix multiplication. In the field of numerical analysis, a sparse matrix is a matrix populated primarily with zeros as elements of the table. By contrast, if the number of non-zero elements in a matrix is relatively large, then it is commonly considered a dense matrix. The fraction of zero elements (non-zero elements) in a matrix is called the sparsity (density). Operations using standard dense-matrix structures and algorithms are relatively slow and consume large amounts of memory when applied to large sparse matrices.

[2] "Argonne National Laboratory Deploys Cerebras CS-1, the World's Fastest Artificial Intelligence Computer | Argonne National Laboratory". www.anl.gov (Press release) (英語). 2019年12月2日閲覧。The WSE is the largest chip ever made at 46,225 square millimeters in area, it is 56.7 times larger than the largest graphics processing unit. It contains 78 times more AI optimized compute cores, 3,000 times more high speed, on-chip memory, 10,000 times more memory bandwidth, and 33,000 times more communication bandwidth.

[3] “Cerebras Systems Unveils the Industry's First Trillion Transistor Chip” (英語). www.businesswire.com (2019年8月19日). 2019年12月2日閲覧。 “The WSE contains 400,000 AI-optimized compute cores. Called SLAC™ for Sparse Linear Algebra Cores, the compute cores are flexible, programmable, and optimized for the sparse linear algebra that underpins all neural network computation”

[4] “プロセッサ開発のセンス～第4回ベクトル・プロセッサ～ | 株式会社エヌエスアイテクス（NSITEXE,Inc.）” (2023年2月22日). 2023年6月18日閲覧。

[6] Survey of Sparse Matrix Storage Formats

[7] Intel® MKL Sparse BLAS Overview | Intel® Developer Zone

[8] "scipy.sparse.coo_matrix ... A sparse matrix in COOrdinate format." scipy.sparse.coo_matrix. scipy. 2022-03-05閲覧.

[9] "scipy.sparse.csr_matrix ... Compressed Sparse Row matrix" scipy.sparse.csr_matrix. scipy. 2022-03-05閲覧.

[:0-10] "csr_matrix((data, indices, indptr) ... is the standard CSR representation where the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]." scipy.sparse.csr_matrix. scipy. 2022-03-05閲覧.

[11] "Advantages of the CSR format ... efficient row slicing" scipy.sparse.csr_matrix. scipy. 2022-03-05閲覧.

[12] "Disadvantages of the CSR format slow column slicing operations" scipy.sparse.csr_matrix. scipy. 2022-03-05閲覧.

[13] "scipy.sparse.csc_matrix ... Compressed Sparse Column matrix" scipy.sparse.csc_matrix. scipy. 2022-03-05閲覧.

[14] "scipy.sparse.bsr_matrix ... Block Sparse Row matrix" scipy.sparse.bsr_matrix. 2022-03-05閲覧.

[1]

[2]

[3]

[4]

[注釈 1]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]