network-monitor/README.md
Ivan Li e2bd5e9be5 Clean up Chinese comments and add comprehensive English README
- Replace all Chinese comments with English equivalents in:
  - src/health_monitor.rs
  - src/lib.rs
  - tests/integration_test.rs
- Add comprehensive README.md with:
  - Project overview and features
  - Architecture diagram
  - Installation and configuration guide
  - Data format specifications
  - Health monitoring documentation
  - Troubleshooting guide
2025-06-30 17:40:37 +08:00

8.2 KiB

Network Monitor

A robust network monitoring service written in Rust that tracks network traffic from multiple sources and provides real-time data via UDP broadcasting. This service is designed for high availability with automatic retry mechanisms and comprehensive health monitoring.

Features

🔄 Dual Network Monitoring

  • Clash Proxy Monitoring: Connects to Clash proxy via WebSocket to monitor proxy traffic statistics
  • WAN Interface Monitoring: Polls OpenWRT/LuCI router interfaces for WAN traffic data

🚀 High Availability

  • Infinite Retry Mechanism: Automatically recovers from network failures and service interruptions
  • Health Monitoring: Comprehensive health tracking with detailed statistics and alerting
  • Exponential Backoff: Smart retry strategy with configurable delays and jitter

📡 Real-time Data Broadcasting

  • UDP Server: Broadcasts network statistics to connected clients
  • Client Management: Automatic client discovery and connection management
  • Data Formats: Structured binary data for efficient transmission

🛡️ Robust Error Handling

  • Connection Timeouts: Configurable timeouts for all network operations
  • Graceful Degradation: Continues operation even when one monitoring source fails
  • Detailed Logging: Comprehensive logging for debugging and monitoring

Architecture

┌─────────────────┐    WebSocket    ┌─────────────────┐
│   Clash Proxy   │◄───────────────►│                 │
└─────────────────┘                 │                 │
                                    │  Network        │    UDP
┌─────────────────┐    HTTP/LuCI    │  Monitor        │◄──────────┐
│ OpenWRT Router  │◄───────────────►│  Service        │           │
└─────────────────┘                 │                 │           │
                                    └─────────────────┘           │
                                                                  │
┌─────────────────┐    UDP Data     ┌─────────────────┐           │
│    Client 1     │◄───────────────►│   UDP Server    │◄──────────┘
└─────────────────┘                 └─────────────────┘
┌─────────────────┐
│    Client 2     │◄───────────────►
└─────────────────┘

Installation

Prerequisites

  • Rust 1.70+ (for building from source)
  • Docker (for containerized deployment)

Building from Source

# Clone the repository
git clone <repository-url>
cd network-monitor

# Build the project
cargo build --release

# Run tests
cargo test

# Run the service
cargo run

Docker Deployment

# Build the Docker image
docker build -t network-monitor .

# Run the container
docker run -d \
  --name network-monitor \
  -p 17890:17890/udp \
  -e CLASH_URL="ws://192.168.1.1:9090/connections?token=your-token" \
  -e LUCI_URL="http://192.168.1.1/cgi-bin/luci" \
  -e LUCI_USERNAME="root" \
  -e LUCI_PASSWORD="your-password" \
  network-monitor

Configuration

The service can be configured via command-line arguments or environment variables:

Parameter Environment Variable Default Value Description
-c, --clash-url CLASH_URL ws://192.168.1.1:9090/connections?token=123456 Clash WebSocket URL
-p, --listen-port LISTEN_PORT 17890 UDP server listen port
-l, --luci-url LUCI_URL http://192.168.1.1/cgi-bin/luci OpenWRT LuCI base URL
-u, --luci-username LUCI_USERNAME root LuCI authentication username
-P, --luci-password LUCI_PASSWORD 123456 LuCI authentication password

Environment File

Create a .env file in the project root:

CLASH_URL=ws://192.168.1.1:9090/connections?token=your-clash-token
LISTEN_PORT=17890
LUCI_URL=http://192.168.1.1/cgi-bin/luci
LUCI_USERNAME=root
LUCI_PASSWORD=your-router-password

Data Formats

Clash Traffic Data (32 bytes)

Bytes 0-7:   Direct upload speed (u64, little-endian)
Bytes 8-15:  Direct download speed (u64, little-endian)
Bytes 16-23: Proxy upload speed (u64, little-endian)
Bytes 24-31: Proxy download speed (u64, little-endian)

WAN Traffic Data (16 bytes)

Bytes 0-7:   WAN upload speed (u64, little-endian)
Bytes 8-15:  WAN download speed (u64, little-endian)

Health Monitoring

The service includes comprehensive health monitoring with the following metrics:

  • Connection Status: Real-time health status for each service
  • Uptime Percentage: Success rate over time
  • Failure Tracking: Consecutive failure counts and timestamps
  • Performance Metrics: Total attempts, successes, and failures

Health reports are logged every minute with detailed statistics.

Retry Strategy

The service implements a sophisticated retry mechanism:

  • Infinite Retries: Critical services never give up
  • Exponential Backoff: Delays increase exponentially with failures
  • Jitter: Random delays prevent thundering herd effects
  • Configurable Limits: Maximum delays and retry counts can be customized

Retry Configurations

  • Fast Retry: For lightweight operations (5 attempts, 100ms-5s delays)
  • Default Retry: Balanced approach (10 attempts, 500ms-30s delays)
  • Slow Retry: For heavyweight operations (15 attempts, 1s-60s delays)
  • Infinite Retry: For critical services (unlimited attempts)

Logging

The service uses structured logging with multiple levels:

  • INFO: Normal operation events and health reports
  • WARN: Service health issues and recoverable errors
  • ERROR: Critical failures and persistent issues
  • DEBUG: Detailed operation information

Set the RUST_LOG environment variable to control log levels:

export RUST_LOG=info  # or debug, warn, error

Development

Project Structure

src/
├── main.rs              # Application entry point
├── lib.rs               # Library exports
├── clash_conn_msg.rs    # Clash message structures
├── health_monitor.rs    # Health monitoring system
├── retry.rs             # Retry mechanism implementation
├── statistics.rs        # Traffic statistics processing
├── udp_server.rs        # UDP broadcasting server
└── wan.rs               # WAN traffic polling
tests/
└── integration_test.rs  # Integration tests

Running Tests

# Run all tests
cargo test

# Run with output
cargo test -- --nocapture

# Run specific test
cargo test test_network_failure_recovery

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Troubleshooting

Common Issues

  1. WebSocket Connection Failures

    • Verify Clash is running and accessible
    • Check the WebSocket URL and authentication token
    • Ensure network connectivity to the Clash instance
  2. LuCI Authentication Failures

    • Verify router credentials
    • Check if the router is accessible
    • Ensure the LuCI interface is enabled
  3. UDP Client Connection Issues

    • Verify the UDP port is not blocked by firewall
    • Check if the service is binding to the correct interface
    • Ensure clients are connecting to the correct port

Debug Mode

Enable debug logging for detailed troubleshooting:

RUST_LOG=debug cargo run

This will provide detailed information about:

  • Connection attempts and failures
  • Retry mechanisms in action
  • Health monitoring decisions
  • UDP client management
  • Data processing and broadcasting