Clean up Chinese comments and add comprehensive English README
- Replace all Chinese comments with English equivalents in: - src/health_monitor.rs - src/lib.rs - tests/integration_test.rs - Add comprehensive README.md with: - Project overview and features - Architecture diagram - Installation and configuration guide - Data format specifications - Health monitoring documentation - Troubleshooting guide
This commit is contained in:
parent
2a9e34d345
commit
e2bd5e9be5
252
README.md
Normal file
252
README.md
Normal file
@ -0,0 +1,252 @@
|
|||||||
|
# Network Monitor
|
||||||
|
|
||||||
|
A robust network monitoring service written in Rust that tracks network traffic from multiple sources and provides real-time data via UDP broadcasting. This service is designed for high availability with automatic retry mechanisms and comprehensive health monitoring.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
### 🔄 Dual Network Monitoring
|
||||||
|
|
||||||
|
- **Clash Proxy Monitoring**: Connects to Clash proxy via WebSocket to monitor proxy traffic statistics
|
||||||
|
- **WAN Interface Monitoring**: Polls OpenWRT/LuCI router interfaces for WAN traffic data
|
||||||
|
|
||||||
|
### 🚀 High Availability
|
||||||
|
|
||||||
|
- **Infinite Retry Mechanism**: Automatically recovers from network failures and service interruptions
|
||||||
|
- **Health Monitoring**: Comprehensive health tracking with detailed statistics and alerting
|
||||||
|
- **Exponential Backoff**: Smart retry strategy with configurable delays and jitter
|
||||||
|
|
||||||
|
### 📡 Real-time Data Broadcasting
|
||||||
|
|
||||||
|
- **UDP Server**: Broadcasts network statistics to connected clients
|
||||||
|
- **Client Management**: Automatic client discovery and connection management
|
||||||
|
- **Data Formats**: Structured binary data for efficient transmission
|
||||||
|
|
||||||
|
### 🛡️ Robust Error Handling
|
||||||
|
|
||||||
|
- **Connection Timeouts**: Configurable timeouts for all network operations
|
||||||
|
- **Graceful Degradation**: Continues operation even when one monitoring source fails
|
||||||
|
- **Detailed Logging**: Comprehensive logging for debugging and monitoring
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐ WebSocket ┌─────────────────┐
|
||||||
|
│ Clash Proxy │◄───────────────►│ │
|
||||||
|
└─────────────────┘ │ │
|
||||||
|
│ Network │ UDP
|
||||||
|
┌─────────────────┐ HTTP/LuCI │ Monitor │◄──────────┐
|
||||||
|
│ OpenWRT Router │◄───────────────►│ Service │ │
|
||||||
|
└─────────────────┘ │ │ │
|
||||||
|
└─────────────────┘ │
|
||||||
|
│
|
||||||
|
┌─────────────────┐ UDP Data ┌─────────────────┐ │
|
||||||
|
│ Client 1 │◄───────────────►│ UDP Server │◄──────────┘
|
||||||
|
└─────────────────┘ └─────────────────┘
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Client 2 │◄───────────────►
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Rust 1.70+ (for building from source)
|
||||||
|
- Docker (for containerized deployment)
|
||||||
|
|
||||||
|
### Building from Source
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone the repository
|
||||||
|
git clone <repository-url>
|
||||||
|
cd network-monitor
|
||||||
|
|
||||||
|
# Build the project
|
||||||
|
cargo build --release
|
||||||
|
|
||||||
|
# Run tests
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
# Run the service
|
||||||
|
cargo run
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker Deployment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build the Docker image
|
||||||
|
docker build -t network-monitor .
|
||||||
|
|
||||||
|
# Run the container
|
||||||
|
docker run -d \
|
||||||
|
--name network-monitor \
|
||||||
|
-p 17890:17890/udp \
|
||||||
|
-e CLASH_URL="ws://192.168.1.1:9090/connections?token=your-token" \
|
||||||
|
-e LUCI_URL="http://192.168.1.1/cgi-bin/luci" \
|
||||||
|
-e LUCI_USERNAME="root" \
|
||||||
|
-e LUCI_PASSWORD="your-password" \
|
||||||
|
network-monitor
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
The service can be configured via command-line arguments or environment variables:
|
||||||
|
|
||||||
|
| Parameter | Environment Variable | Default Value | Description |
|
||||||
|
|-----------|---------------------|---------------|-------------|
|
||||||
|
| `-c, --clash-url` | `CLASH_URL` | `ws://192.168.1.1:9090/connections?token=123456` | Clash WebSocket URL |
|
||||||
|
| `-p, --listen-port` | `LISTEN_PORT` | `17890` | UDP server listen port |
|
||||||
|
| `-l, --luci-url` | `LUCI_URL` | `http://192.168.1.1/cgi-bin/luci` | OpenWRT LuCI base URL |
|
||||||
|
| `-u, --luci-username` | `LUCI_USERNAME` | `root` | LuCI authentication username |
|
||||||
|
| `-P, --luci-password` | `LUCI_PASSWORD` | `123456` | LuCI authentication password |
|
||||||
|
|
||||||
|
### Environment File
|
||||||
|
|
||||||
|
Create a `.env` file in the project root:
|
||||||
|
|
||||||
|
```env
|
||||||
|
CLASH_URL=ws://192.168.1.1:9090/connections?token=your-clash-token
|
||||||
|
LISTEN_PORT=17890
|
||||||
|
LUCI_URL=http://192.168.1.1/cgi-bin/luci
|
||||||
|
LUCI_USERNAME=root
|
||||||
|
LUCI_PASSWORD=your-router-password
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Formats
|
||||||
|
|
||||||
|
### Clash Traffic Data (32 bytes)
|
||||||
|
|
||||||
|
```
|
||||||
|
Bytes 0-7: Direct upload speed (u64, little-endian)
|
||||||
|
Bytes 8-15: Direct download speed (u64, little-endian)
|
||||||
|
Bytes 16-23: Proxy upload speed (u64, little-endian)
|
||||||
|
Bytes 24-31: Proxy download speed (u64, little-endian)
|
||||||
|
```
|
||||||
|
|
||||||
|
### WAN Traffic Data (16 bytes)
|
||||||
|
|
||||||
|
```
|
||||||
|
Bytes 0-7: WAN upload speed (u64, little-endian)
|
||||||
|
Bytes 8-15: WAN download speed (u64, little-endian)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Health Monitoring
|
||||||
|
|
||||||
|
The service includes comprehensive health monitoring with the following metrics:
|
||||||
|
|
||||||
|
- **Connection Status**: Real-time health status for each service
|
||||||
|
- **Uptime Percentage**: Success rate over time
|
||||||
|
- **Failure Tracking**: Consecutive failure counts and timestamps
|
||||||
|
- **Performance Metrics**: Total attempts, successes, and failures
|
||||||
|
|
||||||
|
Health reports are logged every minute with detailed statistics.
|
||||||
|
|
||||||
|
## Retry Strategy
|
||||||
|
|
||||||
|
The service implements a sophisticated retry mechanism:
|
||||||
|
|
||||||
|
- **Infinite Retries**: Critical services never give up
|
||||||
|
- **Exponential Backoff**: Delays increase exponentially with failures
|
||||||
|
- **Jitter**: Random delays prevent thundering herd effects
|
||||||
|
- **Configurable Limits**: Maximum delays and retry counts can be customized
|
||||||
|
|
||||||
|
### Retry Configurations
|
||||||
|
|
||||||
|
- **Fast Retry**: For lightweight operations (5 attempts, 100ms-5s delays)
|
||||||
|
- **Default Retry**: Balanced approach (10 attempts, 500ms-30s delays)
|
||||||
|
- **Slow Retry**: For heavyweight operations (15 attempts, 1s-60s delays)
|
||||||
|
- **Infinite Retry**: For critical services (unlimited attempts)
|
||||||
|
|
||||||
|
## Logging
|
||||||
|
|
||||||
|
The service uses structured logging with multiple levels:
|
||||||
|
|
||||||
|
- **INFO**: Normal operation events and health reports
|
||||||
|
- **WARN**: Service health issues and recoverable errors
|
||||||
|
- **ERROR**: Critical failures and persistent issues
|
||||||
|
- **DEBUG**: Detailed operation information
|
||||||
|
|
||||||
|
Set the `RUST_LOG` environment variable to control log levels:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export RUST_LOG=info # or debug, warn, error
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
src/
|
||||||
|
├── main.rs # Application entry point
|
||||||
|
├── lib.rs # Library exports
|
||||||
|
├── clash_conn_msg.rs # Clash message structures
|
||||||
|
├── health_monitor.rs # Health monitoring system
|
||||||
|
├── retry.rs # Retry mechanism implementation
|
||||||
|
├── statistics.rs # Traffic statistics processing
|
||||||
|
├── udp_server.rs # UDP broadcasting server
|
||||||
|
└── wan.rs # WAN traffic polling
|
||||||
|
tests/
|
||||||
|
└── integration_test.rs # Integration tests
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all tests
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
# Run with output
|
||||||
|
cargo test -- --nocapture
|
||||||
|
|
||||||
|
# Run specific test
|
||||||
|
cargo test test_network_failure_recovery
|
||||||
|
```
|
||||||
|
|
||||||
|
### Contributing
|
||||||
|
|
||||||
|
1. Fork the repository
|
||||||
|
2. Create a feature branch
|
||||||
|
3. Make your changes
|
||||||
|
4. Add tests for new functionality
|
||||||
|
5. Ensure all tests pass
|
||||||
|
6. Submit a pull request
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
1. **WebSocket Connection Failures**
|
||||||
|
- Verify Clash is running and accessible
|
||||||
|
- Check the WebSocket URL and authentication token
|
||||||
|
- Ensure network connectivity to the Clash instance
|
||||||
|
|
||||||
|
2. **LuCI Authentication Failures**
|
||||||
|
- Verify router credentials
|
||||||
|
- Check if the router is accessible
|
||||||
|
- Ensure the LuCI interface is enabled
|
||||||
|
|
||||||
|
3. **UDP Client Connection Issues**
|
||||||
|
- Verify the UDP port is not blocked by firewall
|
||||||
|
- Check if the service is binding to the correct interface
|
||||||
|
- Ensure clients are connecting to the correct port
|
||||||
|
|
||||||
|
### Debug Mode
|
||||||
|
|
||||||
|
Enable debug logging for detailed troubleshooting:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
RUST_LOG=debug cargo run
|
||||||
|
```
|
||||||
|
|
||||||
|
This will provide detailed information about:
|
||||||
|
|
||||||
|
- Connection attempts and failures
|
||||||
|
- Retry mechanisms in action
|
||||||
|
- Health monitoring decisions
|
||||||
|
- UDP client management
|
||||||
|
- Data processing and broadcasting
|
@ -115,7 +115,7 @@ impl HealthMonitor {
|
|||||||
.get_or_init(|| async {
|
.get_or_init(|| async {
|
||||||
let monitor = HealthMonitor::new();
|
let monitor = HealthMonitor::new();
|
||||||
|
|
||||||
// 启动健康状态报告任务
|
// Start health status reporting task
|
||||||
let monitor_clone = monitor.clone();
|
let monitor_clone = monitor.clone();
|
||||||
tokio::spawn(async move {
|
tokio::spawn(async move {
|
||||||
monitor_clone.start_health_reporting().await;
|
monitor_clone.start_health_reporting().await;
|
||||||
@ -192,7 +192,7 @@ impl HealthMonitor {
|
|||||||
self.log_service_health("WebSocket", &websocket_health);
|
self.log_service_health("WebSocket", &websocket_health);
|
||||||
self.log_service_health("WAN Polling", &wan_health);
|
self.log_service_health("WAN Polling", &wan_health);
|
||||||
|
|
||||||
// 如果有服务不健康,发出警告
|
// Warn if any service is unhealthy
|
||||||
if !websocket_health.is_healthy {
|
if !websocket_health.is_healthy {
|
||||||
warn!("WebSocket service is unhealthy! Consecutive failures: {}", websocket_health.consecutive_failures);
|
warn!("WebSocket service is unhealthy! Consecutive failures: {}", websocket_health.consecutive_failures);
|
||||||
}
|
}
|
||||||
@ -200,7 +200,7 @@ impl HealthMonitor {
|
|||||||
warn!("WAN Polling service is unhealthy! Consecutive failures: {}", wan_health.consecutive_failures);
|
warn!("WAN Polling service is unhealthy! Consecutive failures: {}", wan_health.consecutive_failures);
|
||||||
}
|
}
|
||||||
|
|
||||||
// 如果连续失败次数过多,发出错误警报
|
// Alert if consecutive failures are too many
|
||||||
if websocket_health.consecutive_failures > 10 {
|
if websocket_health.consecutive_failures > 10 {
|
||||||
error!("WebSocket service has {} consecutive failures!", websocket_health.consecutive_failures);
|
error!("WebSocket service has {} consecutive failures!", websocket_health.consecutive_failures);
|
||||||
}
|
}
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
pub mod retry;
|
pub mod retry;
|
||||||
pub mod health_monitor;
|
pub mod health_monitor;
|
||||||
|
|
||||||
// 重新导出常用的类型和函数
|
// Re-export commonly used types and functions
|
||||||
pub use retry::{RetryConfig, Retrier, retry_with_config, retry, retry_forever};
|
pub use retry::{RetryConfig, Retrier, retry_with_config, retry, retry_forever};
|
||||||
pub use health_monitor::{HealthMonitor, ServiceType, ConnectionHealth};
|
pub use health_monitor::{HealthMonitor, ServiceType, ConnectionHealth};
|
||||||
|
@ -4,10 +4,10 @@ use std::time::Duration;
|
|||||||
use tokio::time::sleep;
|
use tokio::time::sleep;
|
||||||
use network_monitor::retry::{RetryConfig, retry_with_config};
|
use network_monitor::retry::{RetryConfig, retry_with_config};
|
||||||
|
|
||||||
/// 模拟网络故障的测试
|
/// Test simulating network failure recovery
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_network_failure_recovery() {
|
async fn test_network_failure_recovery() {
|
||||||
// 模拟一个会失败几次然后成功的操作
|
// Simulate an operation that fails a few times then succeeds
|
||||||
let attempt_count = Arc::new(AtomicU32::new(0));
|
let attempt_count = Arc::new(AtomicU32::new(0));
|
||||||
let max_failures = 3;
|
let max_failures = 3;
|
||||||
|
|
||||||
@ -16,7 +16,7 @@ async fn test_network_failure_recovery() {
|
|||||||
initial_delay: Duration::from_millis(10),
|
initial_delay: Duration::from_millis(10),
|
||||||
max_delay: Duration::from_millis(100),
|
max_delay: Duration::from_millis(100),
|
||||||
backoff_multiplier: 1.5,
|
backoff_multiplier: 1.5,
|
||||||
jitter: false, // 关闭抖动以便测试更可预测
|
jitter: false, // Disable jitter for more predictable testing
|
||||||
};
|
};
|
||||||
|
|
||||||
let attempt_count_clone = attempt_count.clone();
|
let attempt_count_clone = attempt_count.clone();
|
||||||
@ -26,10 +26,10 @@ async fn test_network_failure_recovery() {
|
|||||||
let current_attempt = attempt_count.fetch_add(1, Ordering::SeqCst) + 1;
|
let current_attempt = attempt_count.fetch_add(1, Ordering::SeqCst) + 1;
|
||||||
|
|
||||||
if current_attempt <= max_failures {
|
if current_attempt <= max_failures {
|
||||||
// 模拟网络错误
|
// Simulate network error
|
||||||
Err(format!("Network error on attempt {}", current_attempt))
|
Err(format!("Network error on attempt {}", current_attempt))
|
||||||
} else {
|
} else {
|
||||||
// 模拟恢复成功
|
// Simulate successful recovery
|
||||||
Ok(format!("Success on attempt {}", current_attempt))
|
Ok(format!("Success on attempt {}", current_attempt))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -40,7 +40,7 @@ async fn test_network_failure_recovery() {
|
|||||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 4);
|
assert_eq!(attempt_count.load(Ordering::SeqCst), 4);
|
||||||
}
|
}
|
||||||
|
|
||||||
/// 测试连接超时场景
|
/// Test connection timeout scenario
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_connection_timeout_scenario() {
|
async fn test_connection_timeout_scenario() {
|
||||||
let config = RetryConfig {
|
let config = RetryConfig {
|
||||||
@ -59,17 +59,17 @@ async fn test_connection_timeout_scenario() {
|
|||||||
async move {
|
async move {
|
||||||
attempt_count.fetch_add(1, Ordering::SeqCst);
|
attempt_count.fetch_add(1, Ordering::SeqCst);
|
||||||
|
|
||||||
// 模拟连接超时
|
// Simulate connection timeout
|
||||||
sleep(Duration::from_millis(1)).await;
|
sleep(Duration::from_millis(1)).await;
|
||||||
Err("Connection timeout")
|
Err("Connection timeout")
|
||||||
}
|
}
|
||||||
}).await;
|
}).await;
|
||||||
|
|
||||||
assert!(result.is_err());
|
assert!(result.is_err());
|
||||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 3); // 应该尝试了3次
|
assert_eq!(attempt_count.load(Ordering::SeqCst), 3); // Should have attempted 3 times
|
||||||
}
|
}
|
||||||
|
|
||||||
/// 测试快速恢复场景
|
/// Test fast recovery scenario
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_recovery() {
|
async fn test_fast_recovery() {
|
||||||
let config = RetryConfig::fast();
|
let config = RetryConfig::fast();
|
||||||
@ -95,7 +95,7 @@ async fn test_fast_recovery() {
|
|||||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 2);
|
assert_eq!(attempt_count.load(Ordering::SeqCst), 2);
|
||||||
}
|
}
|
||||||
|
|
||||||
/// 测试慢速重试场景
|
/// Test slow retry scenario
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_slow_retry_scenario() {
|
async fn test_slow_retry_scenario() {
|
||||||
let config = RetryConfig::slow();
|
let config = RetryConfig::slow();
|
||||||
@ -121,7 +121,7 @@ async fn test_slow_retry_scenario() {
|
|||||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 3);
|
assert_eq!(attempt_count.load(Ordering::SeqCst), 3);
|
||||||
}
|
}
|
||||||
|
|
||||||
/// 测试最大重试次数限制
|
/// Test maximum retry limit
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_max_retry_limit() {
|
async fn test_max_retry_limit() {
|
||||||
let config = RetryConfig {
|
let config = RetryConfig {
|
||||||
|
Loading…
x
Reference in New Issue
Block a user