Clean up Chinese comments and add comprehensive English README
- Replace all Chinese comments with English equivalents in: - src/health_monitor.rs - src/lib.rs - tests/integration_test.rs - Add comprehensive README.md with: - Project overview and features - Architecture diagram - Installation and configuration guide - Data format specifications - Health monitoring documentation - Troubleshooting guide
This commit is contained in:
parent
2a9e34d345
commit
e2bd5e9be5
252
README.md
Normal file
252
README.md
Normal file
@ -0,0 +1,252 @@
|
||||
# Network Monitor
|
||||
|
||||
A robust network monitoring service written in Rust that tracks network traffic from multiple sources and provides real-time data via UDP broadcasting. This service is designed for high availability with automatic retry mechanisms and comprehensive health monitoring.
|
||||
|
||||
## Features
|
||||
|
||||
### 🔄 Dual Network Monitoring
|
||||
|
||||
- **Clash Proxy Monitoring**: Connects to Clash proxy via WebSocket to monitor proxy traffic statistics
|
||||
- **WAN Interface Monitoring**: Polls OpenWRT/LuCI router interfaces for WAN traffic data
|
||||
|
||||
### 🚀 High Availability
|
||||
|
||||
- **Infinite Retry Mechanism**: Automatically recovers from network failures and service interruptions
|
||||
- **Health Monitoring**: Comprehensive health tracking with detailed statistics and alerting
|
||||
- **Exponential Backoff**: Smart retry strategy with configurable delays and jitter
|
||||
|
||||
### 📡 Real-time Data Broadcasting
|
||||
|
||||
- **UDP Server**: Broadcasts network statistics to connected clients
|
||||
- **Client Management**: Automatic client discovery and connection management
|
||||
- **Data Formats**: Structured binary data for efficient transmission
|
||||
|
||||
### 🛡️ Robust Error Handling
|
||||
|
||||
- **Connection Timeouts**: Configurable timeouts for all network operations
|
||||
- **Graceful Degradation**: Continues operation even when one monitoring source fails
|
||||
- **Detailed Logging**: Comprehensive logging for debugging and monitoring
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ WebSocket ┌─────────────────┐
|
||||
│ Clash Proxy │◄───────────────►│ │
|
||||
└─────────────────┘ │ │
|
||||
│ Network │ UDP
|
||||
┌─────────────────┐ HTTP/LuCI │ Monitor │◄──────────┐
|
||||
│ OpenWRT Router │◄───────────────►│ Service │ │
|
||||
└─────────────────┘ │ │ │
|
||||
└─────────────────┘ │
|
||||
│
|
||||
┌─────────────────┐ UDP Data ┌─────────────────┐ │
|
||||
│ Client 1 │◄───────────────►│ UDP Server │◄──────────┘
|
||||
└─────────────────┘ └─────────────────┘
|
||||
┌─────────────────┐
|
||||
│ Client 2 │◄───────────────►
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Rust 1.70+ (for building from source)
|
||||
- Docker (for containerized deployment)
|
||||
|
||||
### Building from Source
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone <repository-url>
|
||||
cd network-monitor
|
||||
|
||||
# Build the project
|
||||
cargo build --release
|
||||
|
||||
# Run tests
|
||||
cargo test
|
||||
|
||||
# Run the service
|
||||
cargo run
|
||||
```
|
||||
|
||||
### Docker Deployment
|
||||
|
||||
```bash
|
||||
# Build the Docker image
|
||||
docker build -t network-monitor .
|
||||
|
||||
# Run the container
|
||||
docker run -d \
|
||||
--name network-monitor \
|
||||
-p 17890:17890/udp \
|
||||
-e CLASH_URL="ws://192.168.1.1:9090/connections?token=your-token" \
|
||||
-e LUCI_URL="http://192.168.1.1/cgi-bin/luci" \
|
||||
-e LUCI_USERNAME="root" \
|
||||
-e LUCI_PASSWORD="your-password" \
|
||||
network-monitor
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The service can be configured via command-line arguments or environment variables:
|
||||
|
||||
| Parameter | Environment Variable | Default Value | Description |
|
||||
|-----------|---------------------|---------------|-------------|
|
||||
| `-c, --clash-url` | `CLASH_URL` | `ws://192.168.1.1:9090/connections?token=123456` | Clash WebSocket URL |
|
||||
| `-p, --listen-port` | `LISTEN_PORT` | `17890` | UDP server listen port |
|
||||
| `-l, --luci-url` | `LUCI_URL` | `http://192.168.1.1/cgi-bin/luci` | OpenWRT LuCI base URL |
|
||||
| `-u, --luci-username` | `LUCI_USERNAME` | `root` | LuCI authentication username |
|
||||
| `-P, --luci-password` | `LUCI_PASSWORD` | `123456` | LuCI authentication password |
|
||||
|
||||
### Environment File
|
||||
|
||||
Create a `.env` file in the project root:
|
||||
|
||||
```env
|
||||
CLASH_URL=ws://192.168.1.1:9090/connections?token=your-clash-token
|
||||
LISTEN_PORT=17890
|
||||
LUCI_URL=http://192.168.1.1/cgi-bin/luci
|
||||
LUCI_USERNAME=root
|
||||
LUCI_PASSWORD=your-router-password
|
||||
```
|
||||
|
||||
## Data Formats
|
||||
|
||||
### Clash Traffic Data (32 bytes)
|
||||
|
||||
```
|
||||
Bytes 0-7: Direct upload speed (u64, little-endian)
|
||||
Bytes 8-15: Direct download speed (u64, little-endian)
|
||||
Bytes 16-23: Proxy upload speed (u64, little-endian)
|
||||
Bytes 24-31: Proxy download speed (u64, little-endian)
|
||||
```
|
||||
|
||||
### WAN Traffic Data (16 bytes)
|
||||
|
||||
```
|
||||
Bytes 0-7: WAN upload speed (u64, little-endian)
|
||||
Bytes 8-15: WAN download speed (u64, little-endian)
|
||||
```
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
The service includes comprehensive health monitoring with the following metrics:
|
||||
|
||||
- **Connection Status**: Real-time health status for each service
|
||||
- **Uptime Percentage**: Success rate over time
|
||||
- **Failure Tracking**: Consecutive failure counts and timestamps
|
||||
- **Performance Metrics**: Total attempts, successes, and failures
|
||||
|
||||
Health reports are logged every minute with detailed statistics.
|
||||
|
||||
## Retry Strategy
|
||||
|
||||
The service implements a sophisticated retry mechanism:
|
||||
|
||||
- **Infinite Retries**: Critical services never give up
|
||||
- **Exponential Backoff**: Delays increase exponentially with failures
|
||||
- **Jitter**: Random delays prevent thundering herd effects
|
||||
- **Configurable Limits**: Maximum delays and retry counts can be customized
|
||||
|
||||
### Retry Configurations
|
||||
|
||||
- **Fast Retry**: For lightweight operations (5 attempts, 100ms-5s delays)
|
||||
- **Default Retry**: Balanced approach (10 attempts, 500ms-30s delays)
|
||||
- **Slow Retry**: For heavyweight operations (15 attempts, 1s-60s delays)
|
||||
- **Infinite Retry**: For critical services (unlimited attempts)
|
||||
|
||||
## Logging
|
||||
|
||||
The service uses structured logging with multiple levels:
|
||||
|
||||
- **INFO**: Normal operation events and health reports
|
||||
- **WARN**: Service health issues and recoverable errors
|
||||
- **ERROR**: Critical failures and persistent issues
|
||||
- **DEBUG**: Detailed operation information
|
||||
|
||||
Set the `RUST_LOG` environment variable to control log levels:
|
||||
|
||||
```bash
|
||||
export RUST_LOG=info # or debug, warn, error
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── main.rs # Application entry point
|
||||
├── lib.rs # Library exports
|
||||
├── clash_conn_msg.rs # Clash message structures
|
||||
├── health_monitor.rs # Health monitoring system
|
||||
├── retry.rs # Retry mechanism implementation
|
||||
├── statistics.rs # Traffic statistics processing
|
||||
├── udp_server.rs # UDP broadcasting server
|
||||
└── wan.rs # WAN traffic polling
|
||||
tests/
|
||||
└── integration_test.rs # Integration tests
|
||||
```
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
cargo test
|
||||
|
||||
# Run with output
|
||||
cargo test -- --nocapture
|
||||
|
||||
# Run specific test
|
||||
cargo test test_network_failure_recovery
|
||||
```
|
||||
|
||||
### Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Add tests for new functionality
|
||||
5. Ensure all tests pass
|
||||
6. Submit a pull request
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **WebSocket Connection Failures**
|
||||
- Verify Clash is running and accessible
|
||||
- Check the WebSocket URL and authentication token
|
||||
- Ensure network connectivity to the Clash instance
|
||||
|
||||
2. **LuCI Authentication Failures**
|
||||
- Verify router credentials
|
||||
- Check if the router is accessible
|
||||
- Ensure the LuCI interface is enabled
|
||||
|
||||
3. **UDP Client Connection Issues**
|
||||
- Verify the UDP port is not blocked by firewall
|
||||
- Check if the service is binding to the correct interface
|
||||
- Ensure clients are connecting to the correct port
|
||||
|
||||
### Debug Mode
|
||||
|
||||
Enable debug logging for detailed troubleshooting:
|
||||
|
||||
```bash
|
||||
RUST_LOG=debug cargo run
|
||||
```
|
||||
|
||||
This will provide detailed information about:
|
||||
|
||||
- Connection attempts and failures
|
||||
- Retry mechanisms in action
|
||||
- Health monitoring decisions
|
||||
- UDP client management
|
||||
- Data processing and broadcasting
|
@ -115,7 +115,7 @@ impl HealthMonitor {
|
||||
.get_or_init(|| async {
|
||||
let monitor = HealthMonitor::new();
|
||||
|
||||
// 启动健康状态报告任务
|
||||
// Start health status reporting task
|
||||
let monitor_clone = monitor.clone();
|
||||
tokio::spawn(async move {
|
||||
monitor_clone.start_health_reporting().await;
|
||||
@ -192,7 +192,7 @@ impl HealthMonitor {
|
||||
self.log_service_health("WebSocket", &websocket_health);
|
||||
self.log_service_health("WAN Polling", &wan_health);
|
||||
|
||||
// 如果有服务不健康,发出警告
|
||||
// Warn if any service is unhealthy
|
||||
if !websocket_health.is_healthy {
|
||||
warn!("WebSocket service is unhealthy! Consecutive failures: {}", websocket_health.consecutive_failures);
|
||||
}
|
||||
@ -200,7 +200,7 @@ impl HealthMonitor {
|
||||
warn!("WAN Polling service is unhealthy! Consecutive failures: {}", wan_health.consecutive_failures);
|
||||
}
|
||||
|
||||
// 如果连续失败次数过多,发出错误警报
|
||||
// Alert if consecutive failures are too many
|
||||
if websocket_health.consecutive_failures > 10 {
|
||||
error!("WebSocket service has {} consecutive failures!", websocket_health.consecutive_failures);
|
||||
}
|
||||
|
@ -1,6 +1,6 @@
|
||||
pub mod retry;
|
||||
pub mod health_monitor;
|
||||
|
||||
// 重新导出常用的类型和函数
|
||||
// Re-export commonly used types and functions
|
||||
pub use retry::{RetryConfig, Retrier, retry_with_config, retry, retry_forever};
|
||||
pub use health_monitor::{HealthMonitor, ServiceType, ConnectionHealth};
|
||||
|
@ -4,10 +4,10 @@ use std::time::Duration;
|
||||
use tokio::time::sleep;
|
||||
use network_monitor::retry::{RetryConfig, retry_with_config};
|
||||
|
||||
/// 模拟网络故障的测试
|
||||
/// Test simulating network failure recovery
|
||||
#[tokio::test]
|
||||
async fn test_network_failure_recovery() {
|
||||
// 模拟一个会失败几次然后成功的操作
|
||||
// Simulate an operation that fails a few times then succeeds
|
||||
let attempt_count = Arc::new(AtomicU32::new(0));
|
||||
let max_failures = 3;
|
||||
|
||||
@ -16,7 +16,7 @@ async fn test_network_failure_recovery() {
|
||||
initial_delay: Duration::from_millis(10),
|
||||
max_delay: Duration::from_millis(100),
|
||||
backoff_multiplier: 1.5,
|
||||
jitter: false, // 关闭抖动以便测试更可预测
|
||||
jitter: false, // Disable jitter for more predictable testing
|
||||
};
|
||||
|
||||
let attempt_count_clone = attempt_count.clone();
|
||||
@ -26,10 +26,10 @@ async fn test_network_failure_recovery() {
|
||||
let current_attempt = attempt_count.fetch_add(1, Ordering::SeqCst) + 1;
|
||||
|
||||
if current_attempt <= max_failures {
|
||||
// 模拟网络错误
|
||||
// Simulate network error
|
||||
Err(format!("Network error on attempt {}", current_attempt))
|
||||
} else {
|
||||
// 模拟恢复成功
|
||||
// Simulate successful recovery
|
||||
Ok(format!("Success on attempt {}", current_attempt))
|
||||
}
|
||||
}
|
||||
@ -40,7 +40,7 @@ async fn test_network_failure_recovery() {
|
||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 4);
|
||||
}
|
||||
|
||||
/// 测试连接超时场景
|
||||
/// Test connection timeout scenario
|
||||
#[tokio::test]
|
||||
async fn test_connection_timeout_scenario() {
|
||||
let config = RetryConfig {
|
||||
@ -59,17 +59,17 @@ async fn test_connection_timeout_scenario() {
|
||||
async move {
|
||||
attempt_count.fetch_add(1, Ordering::SeqCst);
|
||||
|
||||
// 模拟连接超时
|
||||
// Simulate connection timeout
|
||||
sleep(Duration::from_millis(1)).await;
|
||||
Err("Connection timeout")
|
||||
}
|
||||
}).await;
|
||||
|
||||
assert!(result.is_err());
|
||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 3); // 应该尝试了3次
|
||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 3); // Should have attempted 3 times
|
||||
}
|
||||
|
||||
/// 测试快速恢复场景
|
||||
/// Test fast recovery scenario
|
||||
#[tokio::test]
|
||||
async fn test_fast_recovery() {
|
||||
let config = RetryConfig::fast();
|
||||
@ -95,7 +95,7 @@ async fn test_fast_recovery() {
|
||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 2);
|
||||
}
|
||||
|
||||
/// 测试慢速重试场景
|
||||
/// Test slow retry scenario
|
||||
#[tokio::test]
|
||||
async fn test_slow_retry_scenario() {
|
||||
let config = RetryConfig::slow();
|
||||
@ -121,7 +121,7 @@ async fn test_slow_retry_scenario() {
|
||||
assert_eq!(attempt_count.load(Ordering::SeqCst), 3);
|
||||
}
|
||||
|
||||
/// 测试最大重试次数限制
|
||||
/// Test maximum retry limit
|
||||
#[tokio::test]
|
||||
async fn test_max_retry_limit() {
|
||||
let config = RetryConfig {
|
||||
|
Loading…
x
Reference in New Issue
Block a user