type Measurement struct {
Name string `json:"name,omitempty"`
fieldNames map[string]struct{}
seriesByID map[uint64]*Series
seriesByTagKeyValue map[string]map[string]SeriesIDs
seriesIDs SeriesIDs
在查询过程中,需要经过RetentionPolicy+ShardGroup+Shard三层映射,这些映射都是根据时间戳来的,从而把查询key的任务分发给了各个shard,问题转化为在shard内部的查询。
当对一个shard进行查询时,加载shard对应的tsm文件中的index部分,在内存中构建索引,执行二分查找确定key+timestamp所对应的范围,得到beg和end。在tsm文件的index部分获取索引的详情,从而得到block的地址,解压各个block就得到时序数据。
Compactor
compactor是压缩器,在后台持续运行,每隔 1 秒会检查一次是否有需要压缩合并的数据。
它主要进行两种操作:
Cache=>TSM文件。当 cache 中的数据大小达到阀值后,进行快照,以后转存到一个新的 tsm 文件中。
合并压缩TSM文件:将多个小的 tsm 文件合并成一个,使每个文件尽可能达到单个文件的最大大小,减小文件的数量,而且一些数据的删除操作也是在这个时候完成。
数据压缩过程是一个将write-optimized格式优化为read-optimized的格式的过程,Influxdb的压缩包括多种压缩过程。
LevelCompaction:InfluxDB将TSM文件分为4个层级(Level 1-4),compaction只会发生在同层级文件内,同层级的文件compaction后会晋升到下一层级。从这个规则看,根据时序数据的产生特性,level越高数据生成时间越旧,访问热度越低。由Cache数据初次生成的TSM文件称为Snapshot,多个Snapshot文件compaction后产生Level1的TSM文件,Level1的文件compaction后生成level2的文件,依次类推。低Level和高Level的compaction会采用不同的算法,低level文件的compaction采用低CPU消耗的做法,例如不会做解压缩和block合并,而高level文件的compaction则会做block解压缩以及block合并,以进一步提高压缩率。
IndexOptimizationCompaction: 当Level4的文件积攒到一定个数后,index会变得很大,查询效率会变的比较低。影响查询效率低的因素主要在于同一个TimeSeries数据会被多个TSM文件所包含,所以查询不可避免的需要跨多个文件进行数据整合。所以IndexOptimizationCompaction的主要作用就是将同一TimeSeries下的数据合并到同一个TSM文件中,尽量减少不同TSM文件间的TimeSeries重合度。
FullCompaction: InfluxDB在判断某个Shard长时间内不会再有数据写入之后,会对数据做一次FullCompaction。FullCompaction是LevelCompaction和IndexOptimization的整合,在做完一次FullCompaction之后,这个Shard不会再做任何的compaction,除非有新的数据写入或者删除发生。这个策略是对冷数据的一个规整,主要目的在于提高压缩率。
reporting-disabled = false
bind-address = "127.0.0.1:8088"
[meta]
dir = "/Users/bytedance/.influxdb/meta"
retention-autocreate = true
logging-enabled = true
[data]
dir = "/Users/bytedance/.influxdb/data"
index-version = "inmem"
wal-dir = "/Users/bytedance/.influxdb/wal"
wal-fsync-delay = "0s"
validate-keys = false
query-log-enabled = true
cache-max-memory-size = 1073741824
cache-snapshot-memory-size = 26214400
cache-snapshot-write-cold-duration = "10m0s"
compact-full-write-cold-duration = "4h0m0s"
compact-throughput = 50331648
compact-throughput-burst = 50331648
max-series-per-database = 1000000
max-values-per-tag = 100000
max-concurrent-compactions = 0
max-index-log-file-size = 1048576
series-id-set-cache-size = 0
series-file-max-concurrent-snapshot-compactions = 0
trace-logging-enabled = false
tsm-use-madv-willneed = false
[coordinator]
write-timeout = "10s"
max-concurrent-queries = 0
query-timeout = "0s"
log-queries-after = "0s"
max-select-point = 0
max-select-series = 0
max-select-buckets = 0
[retention]
enabled = true
check-interval = "30m0s"
[shard-precreation]
enabled = true
check-interval = "10m0s"
advance-period = "30m0s"
[monitor]
store-enabled = true
store-database = "_internal"
store-interval = "10s"
[subscriber]
enabled = true
http-timeout = "30s"
insecure-skip-verify = false
ca-certs = ""
write-concurrency = 40
write-buffer-size = 1000
[http]
enabled = true
bind-address = ":8086"
auth-enabled = false
log-enabled = true
suppress-write-log = false
write-tracing = false
flux-enabled = false
flux-log-enabled = false
pprof-enabled = true
pprof-auth-enabled = false
debug-pprof-enabled = false
ping-auth-enabled = false
prom-read-auth-enabled = false
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
https-private-key = ""
max-row-limit = 0
max-connection-limit = 0
shared-secret = ""
realm = "InfluxDB"
unix-socket-enabled = false
unix-socket-permissions = "0777"
bind-socket = "/var/run/influxdb.sock"
max-body-size = 25000000
access-log-path = ""
max-concurrent-write-limit = 0
max-enqueued-write-limit = 0
enqueued-write-timeout = 30000000000
[logging]
format = "auto"
level = "info"
suppress-logo = false
[[graphite]]
enabled = false
bind-address = ":2003"
database = "graphite"
retention-policy = ""
protocol = "tcp"
batch-size = 5000
batch-pending = 10
batch-timeout = "1s"
consistency-level = "one"
separator = "."
udp-read-buffer = 0
[[collectd]]
enabled = false
bind-address = ":25826"
database = "collectd"
retention-policy = ""
batch-size = 5000
batch-pending = 10
batch-timeout = "10s"
read-buffer = 0
typesdb = "/usr/share/collectd/types.db"
security-level = "none"
auth-file = "/etc/collectd/auth_file"
parse-multivalue-plugin = "split"
[[opentsdb]]
enabled = false
bind-address = ":4242"
database = "opentsdb"
retention-policy = ""
consistency-level = "one"
tls-enabled = false
certificate = "/etc/ssl/influxdb.pem"
batch-size = 1000
batch-pending = 5
batch-timeout = "1s"
log-point-errors = true
[[udp]]
enabled = false
bind-address = ":8089"
database = "udp"
retention-policy = ""
batch-size = 5000
batch-pending = 10
read-buffer = 0
batch-timeout = "1s"
precision = ""
[continuous_queries]
log-enabled = true
enabled = true
query-stats-enabled = false
run-interval = "1s"
[tls]
min-version = ""
max-version = ""
中文入门简介:jasper-zhang1.gitbooks.io/influxdb/co…
英文官方Python文档:www.influxdata.com/blog/gettin…
Python教程:influxdb-python.readthedocs.io/en/latest/e…
时序数据库对比:www.cnblogs.com/dhcn/p/1297…
DolphinDB教程:dolphindb/Tutorials_CN 下载地址:DolphinDB官网
influxdb教程,举个例子网:www.hellodemos.com/hello-influ…
OpenTSDB:developer.aliyun.com/article/104…
开源时序数据库解析:developer.aliyun.com/article/106…
influxdb原理详解:www.shangmayuan.com/a/059faa17b…
influxdb查询详解:www.codenong.com/js91edeffca…
influxdb原理那些事:luoxn28.github.io/2020/01/28/… www.linuxdaxue.com/influxdb-pr…
一份比较详细的文档:jasper-zhang1.gitbooks.io/influxdb/co…
官方文档:docs.influxdata.com/influxdb/v1…
时间序列数据的存储和计算 - 开源时序数据库解析(三):zhuanlan.zhihu.com/p/32710333