Clean up Chinese comments and add comprehensive English README

- Replace all Chinese comments with English equivalents in:
  - src/health_monitor.rs
  - src/lib.rs
  - tests/integration_test.rs
- Add comprehensive README.md with:
  - Project overview and features
  - Architecture diagram
  - Installation and configuration guide
  - Data format specifications
  - Health monitoring documentation
  - Troubleshooting guide
This commit is contained in:
Ivan Li 2025-06-30 17:40:37 +08:00
parent 2a9e34d345
commit e2bd5e9be5
4 changed files with 267 additions and 15 deletions

252
README.md Normal file
View File

@ -0,0 +1,252 @@
# Network Monitor
A robust network monitoring service written in Rust that tracks network traffic from multiple sources and provides real-time data via UDP broadcasting. This service is designed for high availability with automatic retry mechanisms and comprehensive health monitoring.
## Features
### 🔄 Dual Network Monitoring
- **Clash Proxy Monitoring**: Connects to Clash proxy via WebSocket to monitor proxy traffic statistics
- **WAN Interface Monitoring**: Polls OpenWRT/LuCI router interfaces for WAN traffic data
### 🚀 High Availability
- **Infinite Retry Mechanism**: Automatically recovers from network failures and service interruptions
- **Health Monitoring**: Comprehensive health tracking with detailed statistics and alerting
- **Exponential Backoff**: Smart retry strategy with configurable delays and jitter
### 📡 Real-time Data Broadcasting
- **UDP Server**: Broadcasts network statistics to connected clients
- **Client Management**: Automatic client discovery and connection management
- **Data Formats**: Structured binary data for efficient transmission
### 🛡️ Robust Error Handling
- **Connection Timeouts**: Configurable timeouts for all network operations
- **Graceful Degradation**: Continues operation even when one monitoring source fails
- **Detailed Logging**: Comprehensive logging for debugging and monitoring
## Architecture
```
┌─────────────────┐ WebSocket ┌─────────────────┐
│ Clash Proxy │◄───────────────►│ │
└─────────────────┘ │ │
│ Network │ UDP
┌─────────────────┐ HTTP/LuCI │ Monitor │◄──────────┐
│ OpenWRT Router │◄───────────────►│ Service │ │
└─────────────────┘ │ │ │
└─────────────────┘ │
┌─────────────────┐ UDP Data ┌─────────────────┐ │
│ Client 1 │◄───────────────►│ UDP Server │◄──────────┘
└─────────────────┘ └─────────────────┘
┌─────────────────┐
│ Client 2 │◄───────────────►
└─────────────────┘
```
## Installation
### Prerequisites
- Rust 1.70+ (for building from source)
- Docker (for containerized deployment)
### Building from Source
```bash
# Clone the repository
git clone <repository-url>
cd network-monitor
# Build the project
cargo build --release
# Run tests
cargo test
# Run the service
cargo run
```
### Docker Deployment
```bash
# Build the Docker image
docker build -t network-monitor .
# Run the container
docker run -d \
--name network-monitor \
-p 17890:17890/udp \
-e CLASH_URL="ws://192.168.1.1:9090/connections?token=your-token" \
-e LUCI_URL="http://192.168.1.1/cgi-bin/luci" \
-e LUCI_USERNAME="root" \
-e LUCI_PASSWORD="your-password" \
network-monitor
```
## Configuration
The service can be configured via command-line arguments or environment variables:
| Parameter | Environment Variable | Default Value | Description |
|-----------|---------------------|---------------|-------------|
| `-c, --clash-url` | `CLASH_URL` | `ws://192.168.1.1:9090/connections?token=123456` | Clash WebSocket URL |
| `-p, --listen-port` | `LISTEN_PORT` | `17890` | UDP server listen port |
| `-l, --luci-url` | `LUCI_URL` | `http://192.168.1.1/cgi-bin/luci` | OpenWRT LuCI base URL |
| `-u, --luci-username` | `LUCI_USERNAME` | `root` | LuCI authentication username |
| `-P, --luci-password` | `LUCI_PASSWORD` | `123456` | LuCI authentication password |
### Environment File
Create a `.env` file in the project root:
```env
CLASH_URL=ws://192.168.1.1:9090/connections?token=your-clash-token
LISTEN_PORT=17890
LUCI_URL=http://192.168.1.1/cgi-bin/luci
LUCI_USERNAME=root
LUCI_PASSWORD=your-router-password
```
## Data Formats
### Clash Traffic Data (32 bytes)
```
Bytes 0-7: Direct upload speed (u64, little-endian)
Bytes 8-15: Direct download speed (u64, little-endian)
Bytes 16-23: Proxy upload speed (u64, little-endian)
Bytes 24-31: Proxy download speed (u64, little-endian)
```
### WAN Traffic Data (16 bytes)
```
Bytes 0-7: WAN upload speed (u64, little-endian)
Bytes 8-15: WAN download speed (u64, little-endian)
```
## Health Monitoring
The service includes comprehensive health monitoring with the following metrics:
- **Connection Status**: Real-time health status for each service
- **Uptime Percentage**: Success rate over time
- **Failure Tracking**: Consecutive failure counts and timestamps
- **Performance Metrics**: Total attempts, successes, and failures
Health reports are logged every minute with detailed statistics.
## Retry Strategy
The service implements a sophisticated retry mechanism:
- **Infinite Retries**: Critical services never give up
- **Exponential Backoff**: Delays increase exponentially with failures
- **Jitter**: Random delays prevent thundering herd effects
- **Configurable Limits**: Maximum delays and retry counts can be customized
### Retry Configurations
- **Fast Retry**: For lightweight operations (5 attempts, 100ms-5s delays)
- **Default Retry**: Balanced approach (10 attempts, 500ms-30s delays)
- **Slow Retry**: For heavyweight operations (15 attempts, 1s-60s delays)
- **Infinite Retry**: For critical services (unlimited attempts)
## Logging
The service uses structured logging with multiple levels:
- **INFO**: Normal operation events and health reports
- **WARN**: Service health issues and recoverable errors
- **ERROR**: Critical failures and persistent issues
- **DEBUG**: Detailed operation information
Set the `RUST_LOG` environment variable to control log levels:
```bash
export RUST_LOG=info # or debug, warn, error
```
## Development
### Project Structure
```
src/
├── main.rs # Application entry point
├── lib.rs # Library exports
├── clash_conn_msg.rs # Clash message structures
├── health_monitor.rs # Health monitoring system
├── retry.rs # Retry mechanism implementation
├── statistics.rs # Traffic statistics processing
├── udp_server.rs # UDP broadcasting server
└── wan.rs # WAN traffic polling
tests/
└── integration_test.rs # Integration tests
```
### Running Tests
```bash
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_network_failure_recovery
```
### Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Ensure all tests pass
6. Submit a pull request
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Troubleshooting
### Common Issues
1. **WebSocket Connection Failures**
- Verify Clash is running and accessible
- Check the WebSocket URL and authentication token
- Ensure network connectivity to the Clash instance
2. **LuCI Authentication Failures**
- Verify router credentials
- Check if the router is accessible
- Ensure the LuCI interface is enabled
3. **UDP Client Connection Issues**
- Verify the UDP port is not blocked by firewall
- Check if the service is binding to the correct interface
- Ensure clients are connecting to the correct port
### Debug Mode
Enable debug logging for detailed troubleshooting:
```bash
RUST_LOG=debug cargo run
```
This will provide detailed information about:
- Connection attempts and failures
- Retry mechanisms in action
- Health monitoring decisions
- UDP client management
- Data processing and broadcasting

View File

@ -115,7 +115,7 @@ impl HealthMonitor {
.get_or_init(|| async { .get_or_init(|| async {
let monitor = HealthMonitor::new(); let monitor = HealthMonitor::new();
// 启动健康状态报告任务 // Start health status reporting task
let monitor_clone = monitor.clone(); let monitor_clone = monitor.clone();
tokio::spawn(async move { tokio::spawn(async move {
monitor_clone.start_health_reporting().await; monitor_clone.start_health_reporting().await;
@ -192,7 +192,7 @@ impl HealthMonitor {
self.log_service_health("WebSocket", &websocket_health); self.log_service_health("WebSocket", &websocket_health);
self.log_service_health("WAN Polling", &wan_health); self.log_service_health("WAN Polling", &wan_health);
// 如果有服务不健康,发出警告 // Warn if any service is unhealthy
if !websocket_health.is_healthy { if !websocket_health.is_healthy {
warn!("WebSocket service is unhealthy! Consecutive failures: {}", websocket_health.consecutive_failures); warn!("WebSocket service is unhealthy! Consecutive failures: {}", websocket_health.consecutive_failures);
} }
@ -200,7 +200,7 @@ impl HealthMonitor {
warn!("WAN Polling service is unhealthy! Consecutive failures: {}", wan_health.consecutive_failures); warn!("WAN Polling service is unhealthy! Consecutive failures: {}", wan_health.consecutive_failures);
} }
// 如果连续失败次数过多,发出错误警报 // Alert if consecutive failures are too many
if websocket_health.consecutive_failures > 10 { if websocket_health.consecutive_failures > 10 {
error!("WebSocket service has {} consecutive failures!", websocket_health.consecutive_failures); error!("WebSocket service has {} consecutive failures!", websocket_health.consecutive_failures);
} }

View File

@ -1,6 +1,6 @@
pub mod retry; pub mod retry;
pub mod health_monitor; pub mod health_monitor;
// 重新导出常用的类型和函数 // Re-export commonly used types and functions
pub use retry::{RetryConfig, Retrier, retry_with_config, retry, retry_forever}; pub use retry::{RetryConfig, Retrier, retry_with_config, retry, retry_forever};
pub use health_monitor::{HealthMonitor, ServiceType, ConnectionHealth}; pub use health_monitor::{HealthMonitor, ServiceType, ConnectionHealth};

View File

@ -4,10 +4,10 @@ use std::time::Duration;
use tokio::time::sleep; use tokio::time::sleep;
use network_monitor::retry::{RetryConfig, retry_with_config}; use network_monitor::retry::{RetryConfig, retry_with_config};
/// 模拟网络故障的测试 /// Test simulating network failure recovery
#[tokio::test] #[tokio::test]
async fn test_network_failure_recovery() { async fn test_network_failure_recovery() {
// 模拟一个会失败几次然后成功的操作 // Simulate an operation that fails a few times then succeeds
let attempt_count = Arc::new(AtomicU32::new(0)); let attempt_count = Arc::new(AtomicU32::new(0));
let max_failures = 3; let max_failures = 3;
@ -16,7 +16,7 @@ async fn test_network_failure_recovery() {
initial_delay: Duration::from_millis(10), initial_delay: Duration::from_millis(10),
max_delay: Duration::from_millis(100), max_delay: Duration::from_millis(100),
backoff_multiplier: 1.5, backoff_multiplier: 1.5,
jitter: false, // 关闭抖动以便测试更可预测 jitter: false, // Disable jitter for more predictable testing
}; };
let attempt_count_clone = attempt_count.clone(); let attempt_count_clone = attempt_count.clone();
@ -26,10 +26,10 @@ async fn test_network_failure_recovery() {
let current_attempt = attempt_count.fetch_add(1, Ordering::SeqCst) + 1; let current_attempt = attempt_count.fetch_add(1, Ordering::SeqCst) + 1;
if current_attempt <= max_failures { if current_attempt <= max_failures {
// 模拟网络错误 // Simulate network error
Err(format!("Network error on attempt {}", current_attempt)) Err(format!("Network error on attempt {}", current_attempt))
} else { } else {
// 模拟恢复成功 // Simulate successful recovery
Ok(format!("Success on attempt {}", current_attempt)) Ok(format!("Success on attempt {}", current_attempt))
} }
} }
@ -40,7 +40,7 @@ async fn test_network_failure_recovery() {
assert_eq!(attempt_count.load(Ordering::SeqCst), 4); assert_eq!(attempt_count.load(Ordering::SeqCst), 4);
} }
/// 测试连接超时场景 /// Test connection timeout scenario
#[tokio::test] #[tokio::test]
async fn test_connection_timeout_scenario() { async fn test_connection_timeout_scenario() {
let config = RetryConfig { let config = RetryConfig {
@ -59,17 +59,17 @@ async fn test_connection_timeout_scenario() {
async move { async move {
attempt_count.fetch_add(1, Ordering::SeqCst); attempt_count.fetch_add(1, Ordering::SeqCst);
// 模拟连接超时 // Simulate connection timeout
sleep(Duration::from_millis(1)).await; sleep(Duration::from_millis(1)).await;
Err("Connection timeout") Err("Connection timeout")
} }
}).await; }).await;
assert!(result.is_err()); assert!(result.is_err());
assert_eq!(attempt_count.load(Ordering::SeqCst), 3); // 应该尝试了3次 assert_eq!(attempt_count.load(Ordering::SeqCst), 3); // Should have attempted 3 times
} }
/// 测试快速恢复场景 /// Test fast recovery scenario
#[tokio::test] #[tokio::test]
async fn test_fast_recovery() { async fn test_fast_recovery() {
let config = RetryConfig::fast(); let config = RetryConfig::fast();
@ -95,7 +95,7 @@ async fn test_fast_recovery() {
assert_eq!(attempt_count.load(Ordering::SeqCst), 2); assert_eq!(attempt_count.load(Ordering::SeqCst), 2);
} }
/// 测试慢速重试场景 /// Test slow retry scenario
#[tokio::test] #[tokio::test]
async fn test_slow_retry_scenario() { async fn test_slow_retry_scenario() {
let config = RetryConfig::slow(); let config = RetryConfig::slow();
@ -121,7 +121,7 @@ async fn test_slow_retry_scenario() {
assert_eq!(attempt_count.load(Ordering::SeqCst), 3); assert_eq!(attempt_count.load(Ordering::SeqCst), 3);
} }
/// 测试最大重试次数限制 /// Test maximum retry limit
#[tokio::test] #[tokio::test]
async fn test_max_retry_limit() { async fn test_max_retry_limit() {
let config = RetryConfig { let config = RetryConfig {