network-monitor/README.md
Ivan Li e2bd5e9be5 Clean up Chinese comments and add comprehensive English README
- Replace all Chinese comments with English equivalents in:
  - src/health_monitor.rs
  - src/lib.rs
  - tests/integration_test.rs
- Add comprehensive README.md with:
  - Project overview and features
  - Architecture diagram
  - Installation and configuration guide
  - Data format specifications
  - Health monitoring documentation
  - Troubleshooting guide
2025-06-30 17:40:37 +08:00

253 lines
8.2 KiB
Markdown

# Network Monitor
A robust network monitoring service written in Rust that tracks network traffic from multiple sources and provides real-time data via UDP broadcasting. This service is designed for high availability with automatic retry mechanisms and comprehensive health monitoring.
## Features
### 🔄 Dual Network Monitoring
- **Clash Proxy Monitoring**: Connects to Clash proxy via WebSocket to monitor proxy traffic statistics
- **WAN Interface Monitoring**: Polls OpenWRT/LuCI router interfaces for WAN traffic data
### 🚀 High Availability
- **Infinite Retry Mechanism**: Automatically recovers from network failures and service interruptions
- **Health Monitoring**: Comprehensive health tracking with detailed statistics and alerting
- **Exponential Backoff**: Smart retry strategy with configurable delays and jitter
### 📡 Real-time Data Broadcasting
- **UDP Server**: Broadcasts network statistics to connected clients
- **Client Management**: Automatic client discovery and connection management
- **Data Formats**: Structured binary data for efficient transmission
### 🛡️ Robust Error Handling
- **Connection Timeouts**: Configurable timeouts for all network operations
- **Graceful Degradation**: Continues operation even when one monitoring source fails
- **Detailed Logging**: Comprehensive logging for debugging and monitoring
## Architecture
```
┌─────────────────┐ WebSocket ┌─────────────────┐
│ Clash Proxy │◄───────────────►│ │
└─────────────────┘ │ │
│ Network │ UDP
┌─────────────────┐ HTTP/LuCI │ Monitor │◄──────────┐
│ OpenWRT Router │◄───────────────►│ Service │ │
└─────────────────┘ │ │ │
└─────────────────┘ │
┌─────────────────┐ UDP Data ┌─────────────────┐ │
│ Client 1 │◄───────────────►│ UDP Server │◄──────────┘
└─────────────────┘ └─────────────────┘
┌─────────────────┐
│ Client 2 │◄───────────────►
└─────────────────┘
```
## Installation
### Prerequisites
- Rust 1.70+ (for building from source)
- Docker (for containerized deployment)
### Building from Source
```bash
# Clone the repository
git clone <repository-url>
cd network-monitor
# Build the project
cargo build --release
# Run tests
cargo test
# Run the service
cargo run
```
### Docker Deployment
```bash
# Build the Docker image
docker build -t network-monitor .
# Run the container
docker run -d \
--name network-monitor \
-p 17890:17890/udp \
-e CLASH_URL="ws://192.168.1.1:9090/connections?token=your-token" \
-e LUCI_URL="http://192.168.1.1/cgi-bin/luci" \
-e LUCI_USERNAME="root" \
-e LUCI_PASSWORD="your-password" \
network-monitor
```
## Configuration
The service can be configured via command-line arguments or environment variables:
| Parameter | Environment Variable | Default Value | Description |
|-----------|---------------------|---------------|-------------|
| `-c, --clash-url` | `CLASH_URL` | `ws://192.168.1.1:9090/connections?token=123456` | Clash WebSocket URL |
| `-p, --listen-port` | `LISTEN_PORT` | `17890` | UDP server listen port |
| `-l, --luci-url` | `LUCI_URL` | `http://192.168.1.1/cgi-bin/luci` | OpenWRT LuCI base URL |
| `-u, --luci-username` | `LUCI_USERNAME` | `root` | LuCI authentication username |
| `-P, --luci-password` | `LUCI_PASSWORD` | `123456` | LuCI authentication password |
### Environment File
Create a `.env` file in the project root:
```env
CLASH_URL=ws://192.168.1.1:9090/connections?token=your-clash-token
LISTEN_PORT=17890
LUCI_URL=http://192.168.1.1/cgi-bin/luci
LUCI_USERNAME=root
LUCI_PASSWORD=your-router-password
```
## Data Formats
### Clash Traffic Data (32 bytes)
```
Bytes 0-7: Direct upload speed (u64, little-endian)
Bytes 8-15: Direct download speed (u64, little-endian)
Bytes 16-23: Proxy upload speed (u64, little-endian)
Bytes 24-31: Proxy download speed (u64, little-endian)
```
### WAN Traffic Data (16 bytes)
```
Bytes 0-7: WAN upload speed (u64, little-endian)
Bytes 8-15: WAN download speed (u64, little-endian)
```
## Health Monitoring
The service includes comprehensive health monitoring with the following metrics:
- **Connection Status**: Real-time health status for each service
- **Uptime Percentage**: Success rate over time
- **Failure Tracking**: Consecutive failure counts and timestamps
- **Performance Metrics**: Total attempts, successes, and failures
Health reports are logged every minute with detailed statistics.
## Retry Strategy
The service implements a sophisticated retry mechanism:
- **Infinite Retries**: Critical services never give up
- **Exponential Backoff**: Delays increase exponentially with failures
- **Jitter**: Random delays prevent thundering herd effects
- **Configurable Limits**: Maximum delays and retry counts can be customized
### Retry Configurations
- **Fast Retry**: For lightweight operations (5 attempts, 100ms-5s delays)
- **Default Retry**: Balanced approach (10 attempts, 500ms-30s delays)
- **Slow Retry**: For heavyweight operations (15 attempts, 1s-60s delays)
- **Infinite Retry**: For critical services (unlimited attempts)
## Logging
The service uses structured logging with multiple levels:
- **INFO**: Normal operation events and health reports
- **WARN**: Service health issues and recoverable errors
- **ERROR**: Critical failures and persistent issues
- **DEBUG**: Detailed operation information
Set the `RUST_LOG` environment variable to control log levels:
```bash
export RUST_LOG=info # or debug, warn, error
```
## Development
### Project Structure
```
src/
├── main.rs # Application entry point
├── lib.rs # Library exports
├── clash_conn_msg.rs # Clash message structures
├── health_monitor.rs # Health monitoring system
├── retry.rs # Retry mechanism implementation
├── statistics.rs # Traffic statistics processing
├── udp_server.rs # UDP broadcasting server
└── wan.rs # WAN traffic polling
tests/
└── integration_test.rs # Integration tests
```
### Running Tests
```bash
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_network_failure_recovery
```
### Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Ensure all tests pass
6. Submit a pull request
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Troubleshooting
### Common Issues
1. **WebSocket Connection Failures**
- Verify Clash is running and accessible
- Check the WebSocket URL and authentication token
- Ensure network connectivity to the Clash instance
2. **LuCI Authentication Failures**
- Verify router credentials
- Check if the router is accessible
- Ensure the LuCI interface is enabled
3. **UDP Client Connection Issues**
- Verify the UDP port is not blocked by firewall
- Check if the service is binding to the correct interface
- Ensure clients are connecting to the correct port
### Debug Mode
Enable debug logging for detailed troubleshooting:
```bash
RUST_LOG=debug cargo run
```
This will provide detailed information about:
- Connection attempts and failures
- Retry mechanisms in action
- Health monitoring decisions
- UDP client management
- Data processing and broadcasting