从零到一:手把手搭建企业级ELK日志分析系统
👋 引言:为什么需要ELK?
在现代分布式系统和微服务架构中,日志管理已成为系统运维和故障排查的关键环节。当服务数量从几个增长到几十甚至上百个时,传统的日志查看方式(如SSH登录服务器查看文件)变得低效且不可行。ELK(Elasticsearch, Logstash, Kibana)栈应运而生,它提供了一个完整的日志收集、存储、分析和可视化解决方案。
本文将带你从零开始,搭建一个生产可用的ELK日志分析系统,涵盖架构设计、组件部署、配置优化等核心内容。
💡 一、ELK架构概述
1.1 核心组件
- Elasticsearch:分布式搜索和分析引擎,负责存储和索引日志数据
- Logstash:数据处理管道,负责收集、转换和发送日志数据
- Kibana:数据可视化平台,提供日志查询和仪表板功能
1.2 扩展架构(ELK+)
在实际生产环境中,我们通常采用增强版架构:
1 2 3
| 应用服务器 → Filebeat → Logstash → Elasticsearch → Kibana ↑ (可选)Kafka/RabbitMQ(作为缓冲层)
|
二、环境准备与规划
2.1 系统要求
- 操作系统:Ubuntu 20.04 LTS / CentOS 7+
- 内存:至少8GB(生产环境建议16GB+)
- 磁盘:SSD,至少50GB可用空间
- Java:OpenJDK 11+
2.2 服务器规划示例
| 主机名 | IP地址 | 角色 | 配置 |
|---|
| elk-node1 | 192.168.1.10 | Elasticsearch主节点 | 8核16GB |
| elk-node2 | 192.168.1.11 | Elasticsearch数据节点 | 8核16GB |
| logstash | 192.168.1.12 | Logstash处理节点 | 4核8GB |
| kibana | 192.168.1.13 | Kibana展示节点 | 4核8GB |
三、详细部署步骤
3.1 安装Java环境
1 2 3 4 5 6
| sudo apt update sudo apt install -y openjdk-11-jdk
java -version
|
3.2 安装Elasticsearch集群
3.2.1 在elk-node1和elk-node2上执行:
1 2 3 4 5 6 7 8 9
| wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
sudo apt update sudo apt install -y elasticsearch
|
3.2.2 配置Elasticsearch(elk-node1)
编辑配置文件 /etc/elasticsearch/elasticsearch.yml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| cluster.name: production-cluster
node.name: elk-node1
node.roles: [ master, data ]
path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch
network.host: 192.168.1.10 http.port: 9200
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11"] cluster.initial_master_nodes: ["elk-node1"]
xpack.security.enabled: true xpack.security.transport.ssl.enabled: true
|
3.2.3 配置Elasticsearch(elk-node2)
1 2 3 4 5
| cluster.name: production-cluster node.name: elk-node2 node.roles: [ data ] network.host: 192.168.1.11 discovery.seed_hosts: ["192.168.1.10", "192.168.1.11"]
|
3.2.4 优化系统参数
1 2 3 4 5 6 7 8 9 10 11
| sudo sysctl -w vm.max_map_count=262144 echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
echo "elasticsearch - nofile 65535" | sudo tee -a /etc/security/limits.conf
sudo systemctl daemon-reload sudo systemctl enable elasticsearch sudo systemctl start elasticsearch
|
3.2.5 设置密码
1 2 3 4
| sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto
|
3.3 安装Logstash
在logstash服务器上执行:
1
| sudo apt install -y logstash
|
创建配置文件 /etc/logstash/conf.d/logstash.conf:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
| input { beats { port => 5044 ssl => false } tcp { port => 5000 codec => json_lines } }
filter { if [type] == "nginx-access" { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ] } } if [type] == "java-app" { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}" } } } mutate { remove_field => ["@version", "host"] } }
output { elasticsearch { hosts => ["http://192.168.1.10:9200", "http://192.168.1.11:9200"] index => "logs-%{type}-%{+YYYY.MM.dd}" user => "elastic" password => "your_password_here" } stdout { codec => rubydebug } }
|
启动Logstash:
1 2
| sudo systemctl enable logstash sudo systemctl start logstash
|
3.4 安装Kibana
在kibana服务器上执行:
1
| sudo apt install -y kibana
|
编辑配置文件 /etc/kibana/kibana.yml:
1 2 3 4 5
| server.port: 5601 server.host: "192.168.1.13" elasticsearch.hosts: ["http://192.168.1.10:9200", "http://192.168.1.11:9200"] elasticsearch.username: "kibana_system" elasticsearch.password: "your_kibana_password_here"
|
启动Kibana:
1 2
| sudo systemctl enable kibana sudo systemctl start kibana
|
3.5 安装Filebeat(在应用服务器)
1 2 3 4 5
| sudo apt install -y filebeat
sudo vim /etc/filebeat/filebeat.yml
|
配置示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| filebeat.inputs: - type: log enabled: true paths: - /var/log/nginx/access.log fields: type: nginx-access fields_under_root: true
- type: log enabled: true paths: - /var/log/myapp/*.log fields: type: java-app fields_under_root: true
output.logstash: hosts: ["192.168.1.12:5044"]
|
启动Filebeat:
1 2
| sudo systemctl enable filebeat sudo systemctl start filebeat
|
四、高级配置与优化
4.1 Elasticsearch索引生命周期管理
创建索引模板和生命周期策略:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| curl -X PUT "http://192.168.1.10:9200/_ilm/policy/logs_policy" \ -H 'Content-Type: application/json' \ -u elastic:your_password \ -d '{ "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "50GB", "max_age": "30d" } } }, "delete": { "min_age": "90d", "actions": { "delete": {} } } } } }'
curl -X PUT "http://192.168.1.10:9200/_index_template/logs_template" \ -H 'Content-Type: application/json' \ -u elastic:your_password \ -d '{ "index_patterns": ["logs-*"], "template": { "settings": { "number_of_shards": 3, "number_of_replicas": 1, "index.lifecycle.name": "logs_policy" } } }'
|
4.2 Logstash性能优化
1 2 3 4 5 6
| pipeline.workers: 4 pipeline.batch.size: 125 pipeline.batch.delay: 50 queue.type: persisted queue.max_bytes: 4gb
|
4.3 监控ELK自身
创建ELK监控配置:
1 2 3 4 5 6 7 8 9
| curl -X PUT "http://192.168.1.10:9200/_cluster/settings" \ -H 'Content-Type: application/json' \ -u elastic:your_password \ -d '{ "persistent": { "xpack.monitoring.collection.enabled": true } }'
|
五、Kibana仪表板配置
5.1 创建索引模式
- 访问 http://192.168.1.13:5601
- 导航到 Management → Stack Management → Index Patterns
- 创建索引模式:
logs-*
5.2 创建Nginx访问日志仪表板
1 2 3 4 5 6 7 8 9
| # 保存为nginx-dashboard.json { "title": "Nginx访问监控", "hits": 0, "description": "", "panelsJSON": "[{\"version\":\"7.14.0\",\"type\":\"metric\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"1\"},\"panelIndex\":\"1\",\"embeddableConfig\":{\"layerId\":\"layer1\",\"title\":\"请求统计\"}}]", "optionsJSON": "{\"darkTheme\":false}", "version": 1 }
|
5.3 常用查询示例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| { "query": { "bool": { "must": [ { "match": { "level": "ERROR" } }, { "range": { "@timestamp": { "gte": "now-1h" } } } ] } } }
{ "size": 0, "aggs": { "status_codes": { "terms": { "field": "response.keyword", "size": 10 } } } }
|
✨ 六、故障排查与维护
6.1 常见问题解决
Elasticsearch启动失败
1 2 3 4 5
| journalctl -u elasticsearch -f
sudo netstat -tlnp | grep 9200
|
Logstash管道阻塞
1 2 3 4 5
| curl http://localhost:9600/_node/stats/pipeline
sudo systemctl restart logstash
|
磁盘空间不足
1 2 3 4 5
| curl -X GET "http://192.168.1.10:9200/_cat/indices?v&s=store.size:desc"
curl -X DELETE "http://192.168.1.10:9200/logs-nginx-2023.01.*"
|
6.2 日常维护命令
1 2 3 4 5 6 7 8 9 10 11
| curl -X GET "http://192.168.1.10:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.10:9200/_cat/nodes?v"
curl -X GET "http://192.168.1.10:9200/_cat/indices?v"
curl -X POST "http://192.168.1.10:9200/_cache/clear"
|
✨ 七、安全加固建议
启用HTTPS
1 2 3 4 5 6 7 8
| xpack.security.http.ssl.enabled: true xpack.security.http.ssl.keystore.path: elastic-certificates.p12
server.ssl.enabled: true server.ssl.certificate: /path/to/your/cert.crt server.ssl.key: /path/to/your/cert.key
|
配置IP白名单
1 2 3
| xpack.security.transport.filter.allow: "192.168.1.0/24" xpack.security.http.filter.allow: "192.168.1.0/24"
|
定期轮换密码
1 2 3 4 5
| curl -X POST "http://192.168.1.10:9200/_security/user/elastic/_password" \ -H 'Content-Type: application/json' \ -u elastic:old_password \ -d '{"password": "new_secure_password"}'
|
结语
通过本文的详细步骤,你已经成功搭建了一个功能完整、生产可用的ELK日志分析系统。这套系统不仅能够帮助你实时监控应用状态,还能通过历史数据分析发现潜在问题,为系统优化提供数据支持。
记住,ELK的配置需要根据实际业务需求不断调整和优化。建议从基础配置开始,随着对系统理解的深入,逐步添加更复杂的功能,如告警机制、机器学习异常检测等。
良好的日志管理是系统稳定运行的基石,而ELK正是构建这一基石的强大工具。开始你的日志分析之旅吧!
[up主专用,视频内嵌代码贴在这]


零点119官方团队
一站式科技资源平台 | 学生/开发者/极客必备
本文由零点119官方团队原创,转载请注明出处。文章ID: 7df460f4