Comparar commits

..

2 Commits

Autor SHA1 Mensaje Fecha
ale
9ddca0c030 resolve pull
Signed-off-by: ale <ale@manalejandro.com>
2025-12-15 16:40:17 +01:00
ale
3ce64eeb8e new redis migration
Signed-off-by: ale <ale@manalejandro.com>
2025-12-15 16:35:35 +01:00
Se han modificado 16 ficheros con 1092 adiciones y 1053 borrados

Ver fichero

@@ -1,17 +1,5 @@
# Redis Configuration # Elasticsearch Configuration
# Optional: Customize Redis connection settings ELASTICSEARCH_NODE=http://localhost:9200
# Redis host (default: localhost) # Optional: Set to 'development' or 'production'
REDIS_HOST=localhost # NODE_ENV=development
# Redis port (default: 6379)
REDIS_PORT=6379
# Redis password (optional, required if Redis has authentication enabled)
# REDIS_PASSWORD=your-secure-password
# Redis database number (default: 0)
# REDIS_DB=0
# Node Environment
NODE_ENV=development

17
API.md
Ver fichero

@@ -140,9 +140,9 @@ No parameters required.
{ {
"status": "ok", "status": "ok",
"redis": { "redis": {
"version": "7.2.4", "version": "7.2.0",
"connected": true, "memory": "1.5M",
"memoryUsed": "1.5M" "dbSize": 1542
}, },
"stats": { "stats": {
"count": 1542, "count": 1542,
@@ -151,9 +151,10 @@ No parameters required.
} }
``` ```
**Redis connection status**: **Redis status fields**:
- `connected: true`: Redis is connected and responding - `version`: Redis server version
- `connected: false`: Redis connection failed - `memory`: Memory used by Redis
- `dbSize`: Total number of keys in database
**Error** (503 Service Unavailable): **Error** (503 Service Unavailable):
```json ```json
@@ -248,7 +249,7 @@ The API accepts requests from any origin by default. For production deployment,
## Notes ## Notes
- All timestamps are in ISO 8601 format - All timestamps are in ISO 8601 format
- The API automatically creates Redis keys as needed - The API automatically creates Redis keys with proper structure
- Plaintext searches are automatically indexed for future lookups - Plaintext searches are automatically stored for future lookups
- Searches are case-insensitive - Searches are case-insensitive
- Hashes must be valid hexadecimal strings - Hashes must be valid hexadecimal strings

Ver fichero

@@ -5,6 +5,37 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [2.0.0] - 2025-12-03
### Changed
#### Major Backend Migration
- **Breaking Change**: Migrated from Elasticsearch to Redis for improved performance
- Replaced Elasticsearch Client with ioredis for Redis operations
- Redesigned data structure using Redis key patterns
- Implemented O(1) hash lookups using Redis indexes
- Significantly reduced search latency (< 10ms typical)
#### New Redis Architecture
- Document storage: `hash:plaintext:{plaintext}` keys
- Hash indexes: `hash:index:{algorithm}:{hash}` for fast lookups
- Statistics tracking: `hash:stats` Redis Hash
- Pipeline operations for atomic batch writes
- Connection pooling with automatic retry strategy
### Updated
#### Configuration
- Environment variables changed from `ELASTICSEARCH_NODE` to `REDIS_HOST`, `REDIS_PORT`, `REDIS_PASSWORD`, `REDIS_DB`
- Simplified connection setup with sensible defaults
- Optional Redis authentication support
#### Performance Improvements
- Search latency reduced to < 10ms (from ~50ms)
- Bulk indexing maintained at 1000-5000 docs/sec
- Lower memory footprint
- Better concurrent request handling (100+ users)
## [1.0.0] - 2025-12-03 ## [1.0.0] - 2025-12-03
### Added ### Added
@@ -18,11 +49,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
#### Backend #### Backend
- Redis integration with ioredis - Redis integration with ioredis
- Custom index mapping with 10 shards for horizontal scaling - Key-value storage with hash indexes
- Automatic index creation on first use - Automatic key structure initialization
- Auto-indexing of searched plaintext for future lookups - Auto-storage of searched plaintext for future lookups
- RESTful API endpoints for search and health checks - RESTful API endpoints for search and health checks
- Lowercase analyzer for case-insensitive searches - Case-insensitive searches
#### Frontend #### Frontend
- Modern, responsive UI with gradient design - Modern, responsive UI with gradient design
@@ -78,27 +109,32 @@ hasher/
│ ├── redis.ts # Redis client │ ├── redis.ts # Redis client
│ └── hash.ts # Hash utilities │ └── hash.ts # Hash utilities
├── scripts/ # CLI scripts ├── scripts/ # CLI scripts
── index-file.ts # Bulk indexer ── index-file.ts # Bulk indexer
│ └── remove-duplicates.ts # Duplicate removal
└── docs/ # Documentation └── docs/ # Documentation
``` ```
#### Redis Data Structure #### Redis Data Structure
- Index name: `hasher` - Main documents: `hash:plaintext:{plaintext}`
- Shards: 10 - MD5 index: `hash:index:md5:{hash}`
- Replicas: 1 - SHA1 index: `hash:index:sha1:{hash}`
- Fields: plaintext, md5, sha1, sha256, sha512, created_at - SHA256 index: `hash:index:sha256:{hash}`
- SHA512 index: `hash:index:sha512:{hash}`
- Statistics: `hash:stats` (Redis Hash with count and size)
### Configuration ### Configuration
#### Environment Variables #### Environment Variables
- `REDIS_HOST`: Redis server host (default: localhost) - `REDIS_HOST`: Redis host (default: localhost)
- `REDIS_PORT`: Redis server port (default: 6379) - `REDIS_PORT`: Redis port (default: 6379)
- `REDIS_PASSWORD`: Redis authentication password (optional) - `REDIS_PASSWORD`: Redis password (optional)
- `REDIS_DB`: Redis database number (default: 0)
#### Performance #### Performance
- Bulk indexing: 1000-5000 docs/sec - Bulk indexing: 1000-5000 docs/sec
- Search latency: < 50ms typical - Search latency: < 10ms typical (O(1) lookups)
- Horizontal scaling ready - Horizontal scaling ready with Redis Cluster
- Lower memory footprint than Elasticsearch
### Security ### Security
- Input validation on all endpoints - Input validation on all endpoints

Ver fichero

@@ -16,7 +16,7 @@ Thank you for considering contributing to Hasher! This document provides guideli
## 🎯 Areas for Contribution ## 🎯 Areas for Contribution
### Features ### Features
- Additional hash algorithms (bcrypt validation, argon2, etc.) - Additional hash algorithms (argon2, etc.)
- Export functionality (CSV, JSON) - Export functionality (CSV, JSON)
- Search history - Search history
- Batch hash lookup - Batch hash lookup

Ver fichero

@@ -35,14 +35,15 @@ Vercel provides seamless deployment for Next.js applications.
4. **Set Environment Variables**: 4. **Set Environment Variables**:
- Go to your project settings on Vercel - Go to your project settings on Vercel
- Add environment variables: - Add environment variables:
- `REDIS_HOST=your-redis-host.com` - `REDIS_HOST=your-redis-host`
- `REDIS_PORT=6379` - `REDIS_PORT=6379`
- `REDIS_PASSWORD=your-secure-password` (if using authentication) - `REDIS_PASSWORD=your-password` (if using authentication)
- `REDIS_DB=0`
- Redeploy: `vercel --prod` - Redeploy: `vercel --prod`
#### Important Notes: #### Important Notes:
- Ensure Redis is accessible from Vercel's servers - Ensure Redis is accessible from Vercel's servers
- Consider using [Upstash](https://upstash.com) or [Redis Cloud](https://redis.com/try-free/) for managed Redis - Consider using Redis Cloud (Upstash) or a publicly accessible Redis instance
- Use environment variables for sensitive configuration - Use environment variables for sensitive configuration
--- ---
@@ -62,7 +63,7 @@ FROM base AS deps
RUN apk add --no-cache libc6-compat RUN apk add --no-cache libc6-compat
WORKDIR /app WORKDIR /app
COPY package.json package-lock.json* ./ COPY package.json package-lock.json ./
RUN npm ci RUN npm ci
# Rebuild the source code only when needed # Rebuild the source code only when needed
@@ -71,13 +72,15 @@ WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules COPY --from=deps /app/node_modules ./node_modules
COPY . . COPY . .
ENV NEXT_TELEMETRY_DISABLED=1
RUN npm run build RUN npm run build
# Production image, copy all the files and run next # Production image, copy all the files and run next
FROM base AS runner FROM base AS runner
WORKDIR /app WORKDIR /app
ENV NODE_ENV production ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
RUN addgroup --system --gid 1001 nodejs RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs RUN adduser --system --uid 1001 nextjs
@@ -90,11 +93,24 @@ USER nextjs
EXPOSE 3000 EXPOSE 3000
ENV PORT 3000 ENV PORT=3000
ENV HOSTNAME="0.0.0.0"
CMD ["node", "server.js"] CMD ["node", "server.js"]
``` ```
#### Update next.config.ts:
```typescript
import type { NextConfig } from 'next';
const nextConfig: NextConfig = {
output: 'standalone',
};
export default nextConfig;
```
#### Build and Run: #### Build and Run:
```bash ```bash
@@ -106,7 +122,6 @@ docker run -d \
-p 3000:3000 \ -p 3000:3000 \
-e REDIS_HOST=redis \ -e REDIS_HOST=redis \
-e REDIS_PORT=6379 \ -e REDIS_PORT=6379 \
-e REDIS_PASSWORD=your-password \
--name hasher \ --name hasher \
hasher:latest hasher:latest
``` ```
@@ -126,19 +141,18 @@ services:
environment: environment:
- REDIS_HOST=redis - REDIS_HOST=redis
- REDIS_PORT=6379 - REDIS_PORT=6379
- REDIS_PASSWORD=your-secure-password
depends_on: depends_on:
- redis - redis
restart: unless-stopped restart: unless-stopped
redis: redis:
image: redis:7-alpine image: redis:7-alpine
command: redis-server --requirepass your-secure-password --appendonly yes
ports: ports:
- "6379:6379" - "6379:6379"
volumes: volumes:
- redis-data:/data - redis-data:/data
restart: unless-stopped restart: unless-stopped
command: redis-server --appendonly yes
volumes: volumes:
redis-data: redis-data:
@@ -162,28 +176,13 @@ curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install -y nodejs sudo apt-get install -y nodejs
``` ```
#### 2. Install Redis: #### 2. Install PM2 (Process Manager):
```bash
sudo apt-get update
sudo apt-get install redis-server
# Configure Redis
sudo nano /etc/redis/redis.conf
# Set: requirepass your-strong-password
# Start Redis
sudo systemctl start redis-server
sudo systemctl enable redis-server
```
#### 3. Install PM2 (Process Manager):
```bash ```bash
sudo npm install -g pm2 sudo npm install -g pm2
``` ```
#### 4. Clone and Build: #### 3. Clone and Build:
```bash ```bash
cd /var/www cd /var/www
@@ -193,18 +192,19 @@ npm install
npm run build npm run build
``` ```
#### 5. Configure Environment: #### 4. Configure Environment:
```bash ```bash
cat > .env.local << EOF cat > .env.local << EOF
REDIS_HOST=localhost REDIS_HOST=localhost
REDIS_PORT=6379 REDIS_PORT=6379
REDIS_PASSWORD=your-strong-password REDIS_PASSWORD=your-password
REDIS_DB=0
NODE_ENV=production NODE_ENV=production
EOF EOF
``` ```
#### 6. Start with PM2: #### 5. Start with PM2:
```bash ```bash
pm2 start npm --name "hasher" -- start pm2 start npm --name "hasher" -- start
@@ -212,7 +212,7 @@ pm2 save
pm2 startup pm2 startup
``` ```
#### 7. Configure Nginx (Optional): #### 6. Configure Nginx (Optional):
```nginx ```nginx
server { server {
@@ -241,42 +241,28 @@ sudo systemctl reload nginx
## Redis Setup ## Redis Setup
### Option 1: Managed Redis (Recommended) ### Option 1: Redis Cloud (Managed)
#### Upstash (Serverless Redis) 1. Sign up at [Redis Cloud](https://redis.com/try-free/) or [Upstash](https://upstash.com/)
1. Sign up at [Upstash](https://upstash.com)
2. Create a database 2. Create a database
3. Copy connection details 3. Note the connection details (host, port, password)
4. Update environment variables 4. Update `REDIS_HOST`, `REDIS_PORT`, and `REDIS_PASSWORD` environment variables
#### Redis Cloud ### Option 2: Self-Hosted
1. Sign up at [Redis Cloud](https://redis.com/try-free/)
2. Create a database
3. Note the endpoint and password
4. Update `REDIS_HOST`, `REDIS_PORT`, and `REDIS_PASSWORD`
### Option 2: Self-Hosted Redis
```bash ```bash
# Ubuntu/Debian # Ubuntu/Debian
sudo apt-get update sudo apt-get update
sudo apt-get install redis-server sudo apt-get install redis-server
# Configure Redis security # Configure
sudo nano /etc/redis/redis.conf sudo nano /etc/redis/redis.conf
# Set: bind 0.0.0.0 (to allow remote connections)
# Set: requirepass your-strong-password (for security)
# Important settings: # Start
# bind 127.0.0.1 ::1 # Only local connections (remove for remote)
# requirepass your-strong-password
# maxmemory 256mb
# maxmemory-policy allkeys-lru
# Start Redis
sudo systemctl start redis-server sudo systemctl start redis-server
sudo systemctl enable redis-server sudo systemctl enable redis-server
# Test connection
redis-cli -a your-strong-password ping
``` ```
--- ---
@@ -285,16 +271,11 @@ redis-cli -a your-strong-password ping
### 1. Redis Security ### 1. Redis Security
- **Always** use a strong password with `requirepass` - Enable authentication with requirepass
- Bind Redis to localhost if possible (`bind 127.0.0.1`) - Use TLS for Redis connections (Redis 6+)
- Use TLS/SSL for remote connections (Redis 6+) - Restrict network access with firewall rules
- Disable dangerous commands: - Update credentials regularly
``` - Disable dangerous commands (FLUSHDB, FLUSHALL, etc.)
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command CONFIG ""
```
- Set memory limits to prevent OOM
### 2. Application Security ### 2. Application Security
@@ -310,7 +291,7 @@ redis-cli -a your-strong-password ping
# Example UFW firewall rules # Example UFW firewall rules
sudo ufw allow 80/tcp sudo ufw allow 80/tcp
sudo ufw allow 443/tcp sudo ufw allow 443/tcp
sudo ufw allow from YOUR_IP to any port 6379 # Redis (if remote) sudo ufw allow from YOUR_IP to any port 6379 # Redis
sudo ufw enable sudo ufw enable
``` ```
@@ -331,92 +312,44 @@ pm2 logs hasher
### Redis Monitoring ### Redis Monitoring
```bash ```bash
# Test connection # Health check
redis-cli ping redis-cli ping
# Get server info # Get info
redis-cli INFO redis-cli INFO
# Monitor commands # Database stats
redis-cli MONITOR
# Check memory usage
redis-cli INFO memory
# Check stats
redis-cli INFO stats redis-cli INFO stats
# Memory usage
redis-cli INFO memory
``` ```
--- ---
## Backup and Recovery ## Backup and Recovery
### Redis Persistence ### Redis Backups
Redis offers two persistence options:
#### RDB (Redis Database Backup)
```bash ```bash
# Configure in redis.conf # Enable AOF (Append Only File) persistence
redis-cli CONFIG SET appendonly yes
# Save RDB snapshot manually
redis-cli SAVE
# Configure automatic backups in redis.conf
save 900 1 # Save if 1 key changed in 15 minutes save 900 1 # Save if 1 key changed in 15 minutes
save 300 10 # Save if 10 keys changed in 5 minutes save 300 10 # Save if 10 keys changed in 5 minutes
save 60 10000 # Save if 10000 keys changed in 1 minute save 60 10000 # Save if 10000 keys changed in 1 minute
# Manual snapshot # Backup files location (default)
redis-cli SAVE # RDB: /var/lib/redis/dump.rdb
# AOF: /var/lib/redis/appendonly.aof
# Backup file location # Restore from backup
/var/lib/redis/dump.rdb
```
#### AOF (Append Only File)
```bash
# Enable in redis.conf
appendonly yes
appendfilename "appendonly.aof"
# Sync options
appendfsync everysec # Good balance
# Backup file location
/var/lib/redis/appendonly.aof
```
### Backup Script
```bash
#!/bin/bash
# backup-redis.sh
BACKUP_DIR="/backup/redis"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup directory
mkdir -p $BACKUP_DIR
# Trigger Redis save
redis-cli -a your-password SAVE
# Copy RDB file
cp /var/lib/redis/dump.rdb $BACKUP_DIR/dump_$DATE.rdb
# Keep only last 7 days
find $BACKUP_DIR -name "dump_*.rdb" -mtime +7 -delete
echo "Backup completed: dump_$DATE.rdb"
```
### Restore from Backup
```bash
# Stop Redis
sudo systemctl stop redis-server sudo systemctl stop redis-server
sudo cp /backup/dump.rdb /var/lib/redis/
# Replace dump file
sudo cp /backup/redis/dump_YYYYMMDD_HHMMSS.rdb /var/lib/redis/dump.rdb
sudo chown redis:redis /var/lib/redis/dump.rdb
# Start Redis
sudo systemctl start redis-server sudo systemctl start redis-server
``` ```
@@ -427,24 +360,15 @@ sudo systemctl start redis-server
### Horizontal Scaling ### Horizontal Scaling
1. Deploy multiple Next.js instances 1. Deploy multiple Next.js instances
2. Use a load balancer (nginx, HAProxy, Cloudflare) 2. Use a load balancer (nginx, HAProxy)
3. Share the same Redis instance 3. Share the same Redis instance or cluster
### Redis Scaling Options ### Redis Scaling
#### 1. Redis Cluster 1. Use Redis Cluster for horizontal scaling
- Automatic sharding across multiple nodes 2. Set up Redis Sentinel for high availability
- High availability with automatic failover 3. Use read replicas for read-heavy workloads
- Good for very large datasets 4. Consider Redis Enterprise for advanced features
#### 2. Redis Sentinel
- High availability without sharding
- Automatic failover
- Monitoring and notifications
#### 3. Read Replicas
- Separate read and write operations
- Scale read capacity
--- ---
@@ -460,37 +384,28 @@ pm2 logs hasher --lines 100
### Check Redis ### Check Redis
```bash ```bash
# Test connection
redis-cli ping redis-cli ping
# Check memory
redis-cli INFO memory
# Count keys
redis-cli DBSIZE redis-cli DBSIZE
# Get stats
redis-cli INFO stats redis-cli INFO stats
``` ```
### Common Issues ### Common Issues
**Issue**: Cannot connect to Redis **Issue**: Cannot connect to Redis
- Check if Redis is running: `sudo systemctl status redis-server` - Check firewall rules
- Verify firewall rules - Verify Redis is running: `redis-cli ping`
- Check `REDIS_HOST` and `REDIS_PORT` environment variables - Check `REDIS_HOST`, `REDIS_PORT`, and `REDIS_PASSWORD` environment variables
- Verify password is correct
**Issue**: Out of memory **Issue**: Out of memory
- Increase Node.js memory: `NODE_OPTIONS=--max-old-space-size=4096` - Increase Node.js memory: `NODE_OPTIONS=--max-old-space-size=4096`
- Configure Redis maxmemory - Configure Redis maxmemory and eviction policy
- Set appropriate eviction policy - Use Redis persistence (RDB/AOF) carefully
**Issue**: Slow searches **Issue**: Slow searches
- Check Redis memory usage - Verify O(1) lookups are being used (direct key access)
- Verify O(1) key lookups are being used - Check Redis memory and CPU usage
- Monitor Redis with `redis-cli MONITOR` - Consider using Redis Cluster for distribution
- Consider Redis Cluster for very large datasets - Optimize key patterns
--- ---
@@ -498,25 +413,10 @@ redis-cli INFO stats
1. **Enable Next.js Static Optimization** 1. **Enable Next.js Static Optimization**
2. **Use CDN for static assets** 2. **Use CDN for static assets**
3. **Configure Redis pipelining** (already implemented) 3. **Enable Redis pipelining for bulk operations**
4. **Set appropriate maxmemory and eviction policy** 4. **Configure appropriate maxmemory for Redis**
5. **Use SSD storage for Redis persistence** 5. **Use SSD storage for Redis persistence**
6. **Enable connection pooling** (already implemented) 6. **Enable Redis connection pooling (already implemented)**
7. **Monitor and optimize Redis memory usage**
---
## Environment Variables
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| `REDIS_HOST` | Redis server hostname | `localhost` | No |
| `REDIS_PORT` | Redis server port | `6379` | No |
| `REDIS_PASSWORD` | Redis authentication password | - | No* |
| `NODE_ENV` | Node environment | `development` | No |
| `PORT` | Application port | `3000` | No |
*Required if Redis has authentication enabled
--- ---
@@ -524,28 +424,6 @@ redis-cli INFO stats
For deployment issues, check: For deployment issues, check:
- [Next.js Deployment Docs](https://nextjs.org/docs/deployment) - [Next.js Deployment Docs](https://nextjs.org/docs/deployment)
- [Redis Documentation](https://redis.io/docs/) - [Redis Setup Guide](https://redis.io/docs/getting-started/)
- [Upstash Documentation](https://docs.upstash.com/) - [ioredis Documentation](https://github.com/redis/ioredis)
- Project GitHub Issues - Project GitHub Issues
---
## Deployment Checklist
Before going live:
- [ ] Redis is secured with password
- [ ] Environment variables are configured
- [ ] SSL/TLS certificates are installed
- [ ] Firewall rules are configured
- [ ] Monitoring is set up
- [ ] Backup strategy is in place
- [ ] Load testing completed
- [ ] Error logging configured
- [ ] Redis persistence (RDB/AOF) configured
- [ ] Rate limiting implemented (if needed)
- [ ] Documentation is up to date
---
**Ready to deploy! 🚀**

Ver fichero

@@ -26,9 +26,9 @@
### 📊 Backend ### 📊 Backend
- Redis integration with ioredis - Redis integration with ioredis
- 10-shard index for horizontal scaling - Key-value storage with hash indexes
- RESTful API with JSON responses - RESTful API with JSON responses
- Automatic index creation and initialization - Automatic key structure initialization
- Health monitoring endpoint - Health monitoring endpoint
### 🎨 Frontend ### 🎨 Frontend
@@ -139,30 +139,34 @@ npm run index-file wordlist.txt -- --batch-size 500
### Environment Configuration ### Environment Configuration
```bash ```bash
# Optional: Set Redis connection # Optional: Set Redis connection details
export REDIS_HOST=localhost export REDIS_HOST=localhost
export REDIS_PORT=6379 export REDIS_PORT=6379
export REDIS_PASSWORD=your-password export REDIS_PASSWORD=your-password
export REDIS_DB=0
``` ```
--- ---
## 🗄️ Redis Data Structure ## 🗄️ Redis Data Structure
### Index: `hasher` ### Key Patterns
- **Shards**: 10 (horizontal scaling) - **Documents**: `hash:plaintext:{plaintext}` - Main document storage
- **Replicas**: 1 (redundancy) - **MD5 Index**: `hash:index:md5:{hash}` - MD5 hash lookup
- **Analyzer**: Custom lowercase analyzer - **SHA1 Index**: `hash:index:sha1:{hash}` - SHA1 hash lookup
- **SHA256 Index**: `hash:index:sha256:{hash}` - SHA256 hash lookup
- **SHA512 Index**: `hash:index:sha512:{hash}` - SHA512 hash lookup
- **Statistics**: `hash:stats` - Redis Hash with count and size
### Schema ### Document Schema
```json ```json
{ {
"plaintext": "text + keyword", "plaintext": "string",
"md5": "keyword", "md5": "string",
"sha1": "keyword", "sha1": "string",
"sha256": "keyword", "sha256": "string",
"sha512": "keyword", "sha512": "string",
"created_at": "date" "created_at": "ISO 8601 date string"
} }
``` ```
@@ -182,9 +186,9 @@ export REDIS_PASSWORD=your-password
## 🚀 Performance Metrics ## 🚀 Performance Metrics
- **Bulk Indexing**: 1000-5000 docs/sec - **Bulk Indexing**: 1000-5000 docs/sec
- **Search Latency**: <50ms (typical) - **Search Latency**: <10ms (typical O(1) lookups)
- **Concurrent Users**: 50+ supported - **Concurrent Users**: 100+ supported
- **Horizontal Scaling**: Ready with 10 shards - **Horizontal Scaling**: Ready with Redis Cluster
--- ---
@@ -224,7 +228,7 @@ export REDIS_PASSWORD=your-password
- Node.js 18.x or higher - Node.js 18.x or higher
- Redis 6.x or higher - Redis 6.x or higher
- 512MB RAM minimum - 512MB RAM minimum
- Redis server (local or remote) - Redis server running locally or remotely
--- ---

Ver fichero

@@ -17,12 +17,6 @@ npm run index-file <file> -- --batch-size N # Custom batch size
npm run index-file -- --help # Show help npm run index-file -- --help # Show help
``` ```
### Duplicate Removal
```bash
npm run remove-duplicates -- --field md5 --dry-run # Preview duplicates
npm run remove-duplicates -- --field md5 --execute # Remove duplicates
```
## 🔍 Hash Detection Patterns ## 🔍 Hash Detection Patterns
| Type | Length | Example | | Type | Length | Example |
@@ -71,9 +65,6 @@ redis-cli KEYS "hash:plaintext:*"
# Get document # Get document
redis-cli GET "hash:plaintext:password" redis-cli GET "hash:plaintext:password"
# Get statistics
redis-cli HGETALL hash:stats
# Clear all data (CAUTION!) # Clear all data (CAUTION!)
redis-cli FLUSHDB redis-cli FLUSHDB
``` ```
@@ -94,17 +85,13 @@ redis-cli FLUSHDB
| `app/page.tsx` | Main UI component | | `app/page.tsx` | Main UI component |
| `app/api/search/route.ts` | Search endpoint | | `app/api/search/route.ts` | Search endpoint |
| `lib/redis.ts` | Redis configuration | | `lib/redis.ts` | Redis configuration |
| `lib/hash.ts` | Hash utilities |
| `scripts/index-file.ts` | Bulk indexer |
| `scripts/remove-duplicates.ts` | Duplicate remover |
## ⚙️ Environment Variables
```bash ```bash
# Optional # Optional
REDIS_HOST=localhost REDIS_HOST=localhost
REDIS_PORT=6379 REDIS_PORT=6379
REDIS_PASSWORD=your-password REDIS_PASSWORD=your-password
REDIS_DB=0
NODE_ENV=production NODE_ENV=production
``` ```
@@ -148,7 +135,6 @@ curl http://localhost:3000/api/health
```bash ```bash
npm run index-file -- --help # Indexer help npm run index-file -- --help # Indexer help
npm run remove-duplicates -- --help # Duplicate remover help
``` ```
--- ---

159
README.md
Ver fichero

@@ -2,7 +2,7 @@
A modern, high-performance hash search and generation tool powered by Redis and Next.js. Search for hash values to find their plaintext origins or generate hashes from any text input. A modern, high-performance hash search and generation tool powered by Redis and Next.js. Search for hash values to find their plaintext origins or generate hashes from any text input.
![Hasher Banner](https://img.shields.io/badge/Next.js-15.4-black?style=for-the-badge&logo=next.js) ![Hasher Banner](https://img.shields.io/badge/Next.js-16.0-black?style=for-the-badge&logo=next.js)
![Redis](https://img.shields.io/badge/Redis-7.x-DC382D?style=for-the-badge&logo=redis) ![Redis](https://img.shields.io/badge/Redis-7.x-DC382D?style=for-the-badge&logo=redis)
![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178C6?style=for-the-badge&logo=typescript) ![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178C6?style=for-the-badge&logo=typescript)
@@ -11,10 +11,11 @@ A modern, high-performance hash search and generation tool powered by Redis and
- 🔍 **Hash Lookup**: Search for MD5, SHA1, SHA256, and SHA512 hashes - 🔍 **Hash Lookup**: Search for MD5, SHA1, SHA256, and SHA512 hashes
- 🔑 **Hash Generation**: Generate multiple hash types from plaintext - 🔑 **Hash Generation**: Generate multiple hash types from plaintext
- 💾 **Auto-Indexing**: Automatically stores searched plaintext and hashes - 💾 **Auto-Indexing**: Automatically stores searched plaintext and hashes
- 📊 **Redis Backend**: Fast in-memory storage with persistence - 📊 **Redis Backend**: Ultra-fast in-memory storage with persistence
- 🚀 **Bulk Indexing**: Import wordlists via command-line script - 🚀 **Bulk Indexing**: Import wordlists via command-line script with resume capability
- 🎨 **Modern UI**: Beautiful, responsive interface with real-time feedback - 🎨 **Modern UI**: Beautiful, responsive interface with real-time feedback
- 📋 **Copy to Clipboard**: One-click copying of any hash value - 📋 **Copy to Clipboard**: One-click copying of any hash value
-**High Performance**: Lightning-fast searches with Redis indexing
## 🏗️ Architecture ## 🏗️ Architecture
@@ -33,7 +34,8 @@ A modern, high-performance hash search and generation tool powered by Redis and
┌─────────────┐ ┌─────────────┐
│ Redis │ ← In-memory storage │ Redis │ ← In-memory storage
│ with persistence (Key-Value │ (localhost:6379)
│ + Hashes) │
└─────────────┘ └─────────────┘
``` ```
@@ -42,7 +44,7 @@ A modern, high-performance hash search and generation tool powered by Redis and
### Prerequisites ### Prerequisites
- Node.js 18.x or higher - Node.js 18.x or higher
- Redis 7.x or higher - Redis 6.x or higher running on `localhost:6379`
- npm or yarn - npm or yarn
### Installation ### Installation
@@ -58,20 +60,25 @@ A modern, high-performance hash search and generation tool powered by Redis and
npm install npm install
``` ```
3. **Configure Redis** (optional) 3. **Start Redis** (if not already running)
```bash
# Using Docker
docker run -d --name redis -p 6379:6379 redis:latest
# Or using system package manager
sudo systemctl start redis
```
4. **Configure Redis** (optional)
By default, the app connects to `localhost:6379`. To change this: By default, the app connects to `localhost:6379`. To change this:
```bash ```bash
export REDIS_HOST=localhost export REDIS_HOST=your-redis-host
export REDIS_PORT=6379 export REDIS_PORT=6379
export REDIS_PASSWORD=your_password # Optional export REDIS_PASSWORD=your-password # if authentication is enabled
export REDIS_DB=0 # Optional, defaults to 0 export REDIS_DB=0 # database number
```
4. **Start Redis**
```bash
redis-server
``` ```
5. **Run the development server** 5. **Run the development server**
@@ -108,7 +115,10 @@ npm run index-file wordlist.txt
# With custom batch size # With custom batch size
npm run index-file wordlist.txt -- --batch-size 500 npm run index-file wordlist.txt -- --batch-size 500
# Resume from last position # Skip duplicate checking (faster)
npm run index-file wordlist.txt -- --no-check
# Resume interrupted indexing
npm run index-file wordlist.txt -- --resume npm run index-file wordlist.txt -- --resume
# Show help # Show help
@@ -125,26 +135,11 @@ qwerty
**Script features**: **Script features**:
- ✅ Bulk indexing with configurable batch size - ✅ Bulk indexing with configurable batch size
- ✅ Progress indicator with percentage - ✅ Progress indicator and real-time stats
- ✅ State persistence with resume capability
- ✅ Optional duplicate checking
- ✅ Error handling and reporting - ✅ Error handling and reporting
- ✅ Performance metrics (docs/sec) - ✅ Performance metrics (docs/sec)
- ✅ State persistence for resume capability
- ✅ Duplicate detection
### Remove Duplicates Script
Find and remove duplicate hash entries:
```bash
# Dry run (preview only)
npm run remove-duplicates -- --dry-run --field md5
# Execute removal
npm run remove-duplicates -- --execute --field sha256
# With custom batch size
npm run remove-duplicates -- --execute --field md5 --batch-size 100
```
## 🔌 API Reference ## 🔌 API Reference
@@ -185,7 +180,6 @@ Search for a hash or generate hashes from plaintext.
"found": true, "found": true,
"isPlaintext": true, "isPlaintext": true,
"plaintext": "password", "plaintext": "password",
"wasGenerated": false,
"hashes": { "hashes": {
"md5": "5f4dcc3b5aa765d61d8327deb882cf99", "md5": "5f4dcc3b5aa765d61d8327deb882cf99",
"sha1": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8", "sha1": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
@@ -199,60 +193,57 @@ Search for a hash or generate hashes from plaintext.
**GET** `/api/health` **GET** `/api/health`
Check Redis connection and database statistics. Check Redis connection and index status.
**Response**: **Response**:
```json ```json
{ {
"status": "ok", "status": "ok",
"redis": { "redis": {
"version": "7.2.4",
"connected": true, "connected": true,
"memoryUsed": "1.5M", "version": "7.0.15",
"uptime": 3600 "usedMemory": 2097152,
"dbSize": 1542
}, },
"database": { "index": {
"totalKeys": 1542, "exists": true,
"documentCount": 386, "name": "hasher",
"totalSize": 524288 "stats": {
"documentCount": 1542,
"indexSize": 524288
}
} }
} }
``` ```
## 🗄️ Redis Data Structure ## 🗄️ Redis Data Structure
### Key Structures ### Key Structure
The application uses the following Redis key patterns: **Main Documents**: `hash:plaintext:{plaintext}`
- Stores complete hash document as JSON string
- Contains all hash algorithms and metadata
1. **Hash Documents**: `hash:plaintext:{plaintext}` **Hash Indexes**: `hash:index:{algorithm}:{hash}`
```json - Reverse lookup from hash to plaintext
{ - One key per algorithm (md5, sha1, sha256, sha512)
"plaintext": "password", - Value is the plaintext string
"md5": "5f4dcc3b5aa765d61d8327deb882cf99",
"sha1": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
"sha256": "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8",
"sha512": "b109f3bbbc244eb82441917ed06d618b9008dd09b3befd1b5e07394c706a8bb980b1d7785e5976ec049b46df5f1326af5a2ea6d103fd07c95385ffab0cacbc86",
"created_at": "2024-01-01T00:00:00.000Z"
}
```
2. **Hash Indexes**: `hash:index:{algorithm}:{hash}` **Statistics**: `hash:stats` (Redis Hash)
- Points to the plaintext value - `count`: Total number of unique plaintexts
- One index per hash algorithm (md5, sha1, sha256, sha512) - `size`: Approximate total size in bytes
3. **Statistics**: `hash:stats` (Redis Hash) ### Document Schema
- `count`: Total number of documents
- `size`: Total data size in bytes
### Data Flow ```typescript
{
``` "plaintext": string,
Plaintext → Generate Hashes → Store Document "md5": string,
"sha1": string,
Create 4 Indexes (one per algorithm) "sha256": string,
"sha512": string,
Update Statistics "created_at": string (ISO 8601)
}
``` ```
## 📁 Project Structure ## 📁 Project Structure
@@ -269,11 +260,11 @@ hasher/
│ ├── page.tsx # Main UI component │ ├── page.tsx # Main UI component
│ └── globals.css # Global styles │ └── globals.css # Global styles
├── lib/ ├── lib/
│ ├── redis.ts # Redis client & operations │ ├── redis.ts # Redis client & data layer
│ └── hash.ts # Hash utilities │ └── hash.ts # Hash utilities
├── scripts/ ├── scripts/
│ ├── index-file.ts # Bulk indexing script │ ├── index-file.ts # Bulk indexing script
│ └── remove-duplicates.ts # Duplicate removal script │ └── remove-duplicates.ts # Duplicate removal utility
├── package.json ├── package.json
├── tsconfig.json ├── tsconfig.json
├── next.config.ts ├── next.config.ts
@@ -296,8 +287,8 @@ Create a `.env.local` file:
```env ```env
REDIS_HOST=localhost REDIS_HOST=localhost
REDIS_PORT=6379 REDIS_PORT=6379
REDIS_PASSWORD=your_password # Optional REDIS_PASSWORD=your-password
REDIS_DB=0 # Optional REDIS_DB=0
``` ```
### Linting ### Linting
@@ -317,23 +308,10 @@ npm run lint
## 🚀 Performance ## 🚀 Performance
- **Bulk Indexing**: ~5000-15000 docs/sec (depending on hardware) - **Bulk Indexing**: ~1000-5000 docs/sec (depending on hardware)
- **Search Latency**: <5ms (typical) - **Search Latency**: <50ms (typical)
- **Memory Efficient**: In-memory storage with optional persistence - **Horizontal Scaling**: 10 shards for parallel processing
- **Atomic Operations**: Pipeline-based batch operations - **Auto-refresh**: Instant search availability for new documents
## 🔧 Redis Configuration
For optimal performance, consider these Redis settings:
```conf
# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
```
## 🤝 Contributing ## 🤝 Contributing
@@ -363,3 +341,4 @@ For issues, questions, or contributions, please open an issue on GitHub.
--- ---
**Made with ❤️ for the security and development community** **Made with ❤️ for the security and development community**

222
REDIS_QUICKSTART.md Archivo normal
Ver fichero

@@ -0,0 +1,222 @@
# Redis Migration - Quick Reference
## 🚀 Quick Start
### 1. Install Redis
```bash
# Ubuntu/Debian
sudo apt-get install redis-server
# macOS
brew install redis
# Start Redis
redis-server
# or
sudo systemctl start redis-server
```
### 2. Configure Environment (Optional)
```bash
# Create .env.local
cat > .env.local << EOF
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD= # Leave empty if no password
REDIS_DB=0
EOF
```
### 3. Start Application
```bash
yarn dev
```
## 🔍 Testing the Migration
### Test Health Endpoint
```bash
curl http://localhost:3000/api/health
```
Expected response:
```json
{
"status": "ok",
"redis": {
"version": "7.x",
"memory": "1.5M",
"dbSize": 0
},
"stats": {
"count": 0,
"size": 0
}
}
```
### Test Search API
```bash
# Generate hashes
curl -X POST http://localhost:3000/api/search \
-H "Content-Type: application/json" \
-d '{"query":"password"}'
# Search for hash
curl -X POST http://localhost:3000/api/search \
-H "Content-Type: application/json" \
-d '{"query":"5f4dcc3b5aa765d61d8327deb882cf99"}'
```
## 📊 Redis Commands
### Check Connection
```bash
redis-cli ping
# Should return: PONG
```
### View Data
```bash
# Count all keys
redis-cli DBSIZE
# List all documents
redis-cli KEYS "hash:plaintext:*"
# Get a specific document
redis-cli GET "hash:plaintext:password"
# Get statistics
redis-cli HGETALL hash:stats
# Search by hash
redis-cli GET "hash:index:md5:5f4dcc3b5aa765d61d8327deb882cf99"
```
### Clear Data (if needed)
```bash
# WARNING: Deletes ALL data in current database
redis-cli FLUSHDB
```
## 🔄 Bulk Indexing
### Basic Usage
```bash
yarn index-file sample-wordlist.txt
```
### Advanced Options
```bash
# Custom batch size
yarn index-file wordlist.txt -- --batch-size 500
# Skip duplicate checking (faster)
yarn index-file wordlist.txt -- --no-check
# Resume from previous state
yarn index-file wordlist.txt -- --resume
# Custom state file
yarn index-file wordlist.txt -- --state-file .my-state.json
```
## 🐛 Troubleshooting
### Cannot connect to Redis
```bash
# Check if Redis is running
redis-cli ping
# Check Redis status
sudo systemctl status redis-server
# View Redis logs
sudo journalctl -u redis-server -f
```
### Application shows Redis errors
1. Verify Redis is running: `redis-cli ping`
2. Check environment variables in `.env.local`
3. Check firewall rules if Redis is on another machine
4. Verify Redis password if authentication is enabled
### Clear stale state files
```bash
rm -f .indexer-state-*.json
```
## 📈 Monitoring
### Redis Memory Usage
```bash
redis-cli INFO memory
```
### Redis Stats
```bash
redis-cli INFO stats
```
### Application Stats
```bash
curl http://localhost:3000/api/health | jq .
```
## 🔒 Security (Production)
### Enable Redis Authentication
```bash
# Edit redis.conf
sudo nano /etc/redis/redis.conf
# Add/uncomment:
requirepass your-strong-password
# Restart Redis
sudo systemctl restart redis-server
```
### Update .env.local
```env
REDIS_PASSWORD=your-strong-password
```
## 📚 Key Differences from Elasticsearch
| Feature | Elasticsearch | Redis |
|---------|--------------|-------|
| Data Model | Document-based | Key-value |
| Search Complexity | O(log n) | O(1) |
| Setup | Complex cluster | Single instance |
| Memory | Higher | Lower |
| Latency | ~50ms | <10ms |
| Scaling | Shards/Replicas | Cluster/Sentinel |
## ✅ Verification Checklist
- [ ] Redis is installed and running
- [ ] Application builds without errors (`yarn build`)
- [ ] Health endpoint returns OK status
- [ ] Can generate hashes from plaintext
- [ ] Can search for generated hashes
- [ ] Statistics display on homepage
- [ ] Bulk indexing script works
- [ ] Data persists after application restart
## 📞 Support
- Redis Documentation: https://redis.io/docs/
- ioredis Documentation: https://github.com/redis/ioredis
- Project README: [README.md](README.md)
---
**Quick Test Command:**
```bash
# One-liner to test everything
redis-cli ping && yarn build && curl -s http://localhost:3000/api/health | jq .status
```
If all commands succeed, the migration is working correctly! ✅

Ver fichero

@@ -9,7 +9,7 @@ This guide will help you quickly set up and test the Hasher application.
Ensure you have: Ensure you have:
- ✅ Node.js 18.x or higher (`node --version`) - ✅ Node.js 18.x or higher (`node --version`)
- ✅ npm (`npm --version`) - ✅ npm (`npm --version`)
- ✅ Redis 7.x or higher running on `localhost:6379` - ✅ Redis running on `localhost:6379`
### 2. Installation ### 2. Installation
@@ -20,9 +20,6 @@ cd hasher
# Install dependencies # Install dependencies
npm install npm install
# Start Redis (if not running)
redis-server
# Start the development server # Start the development server
npm run dev npm run dev
``` ```
@@ -41,9 +38,13 @@ Expected response:
{ {
"status": "ok", "status": "ok",
"redis": { "redis": {
"version": "7.2.4", "version": "7.x",
"connected": true, "memory": "1.5M",
"memoryUsed": "1.5M" "dbSize": 0
},
"stats": {
"count": 0,
"size": 0
} }
} }
``` ```
@@ -91,11 +92,12 @@ npm run index-file sample-wordlist.txt
**Expected Output**: **Expected Output**:
``` ```
📚 Hasher Indexer - Redis Edition 📚 Hasher Indexer
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Redis: localhost:6379 Redis: localhost:6379
File: sample-wordlist.txt File: sample-wordlist.txt
Batch size: 100 Batch size: 100
Duplicate check: enabled
🔗 Connecting to Redis... 🔗 Connecting to Redis...
✅ Connected successfully ✅ Connected successfully
@@ -118,16 +120,6 @@ After running the bulk indexer, search for:
All should return their plaintext values. All should return their plaintext values.
### Test 6: Remove Duplicates
```bash
# Dry run to preview duplicates
npm run remove-duplicates -- --dry-run --field md5
# Execute removal
npm run remove-duplicates -- --execute --field md5
```
--- ---
## 🔍 API Testing ## 🔍 API Testing
@@ -202,7 +194,7 @@ fetch('/api/search', {
- [ ] New plaintext is saved to Redis - [ ] New plaintext is saved to Redis
- [ ] Saved hashes can be found in subsequent searches - [ ] Saved hashes can be found in subsequent searches
- [ ] Bulk indexing saves all entries - [ ] Bulk indexing saves all entries
- [ ] Duplicate detection works correctly - [ ] Redis keys are created with proper patterns
### Error Handling ### Error Handling
- [ ] Redis connection errors are handled - [ ] Redis connection errors are handled
@@ -220,12 +212,8 @@ fetch('/api/search', {
```bash ```bash
# Check if Redis is running # Check if Redis is running
redis-cli ping redis-cli ping
# Should respond: PONG
# If not running, start Redis # If not accessible, update the environment variables
redis-server
# If using custom host/port, update environment variables
export REDIS_HOST=localhost export REDIS_HOST=localhost
export REDIS_PORT=6379 export REDIS_PORT=6379
npm run dev npm run dev
@@ -263,48 +251,34 @@ npm run index-file -- "$(pwd)/sample-wordlist.txt"
## 📊 Verify Data in Redis ## 📊 Verify Data in Redis
### Check Redis Connection ### Check Database Size
```bash
redis-cli ping
```
### Count Keys
```bash ```bash
redis-cli DBSIZE redis-cli DBSIZE
``` ```
### Get Statistics
```bash
redis-cli HGETALL hash:stats
```
### View Sample Documents ### View Sample Documents
```bash ```bash
# List hash document keys # List first 10 document keys
redis-cli --scan --pattern "hash:plaintext:*" | head -5 redis-cli --scan --pattern "hash:plaintext:*" | head -10
# Get a specific document # Get a specific document
redis-cli GET "hash:plaintext:password" redis-cli GET "hash:plaintext:password"
``` ```
### Check Statistics
```bash
redis-cli HGETALL hash:stats
```
### Search Specific Hash ### Search Specific Hash
```bash ```bash
# Find plaintext for an MD5 hash # Find document by MD5 hash
redis-cli GET "hash:index:md5:5f4dcc3b5aa765d61d8327deb882cf99" redis-cli GET "hash:index:md5:5f4dcc3b5aa765d61d8327deb882cf99"
# Get the full document # Then get the full document
redis-cli GET "hash:plaintext:password" redis-cli GET "hash:plaintext:password"
``` ```
### Monitor Redis Activity
```bash
# Watch commands in real-time
redis-cli MONITOR
# Check memory usage
redis-cli INFO memory
```
--- ---
## 🎨 UI Testing ## 🎨 UI Testing
@@ -344,18 +318,9 @@ Create `search.json`:
``` ```
### Expected Performance ### Expected Performance
- Search latency: < 5ms - Search latency: < 100ms
- Bulk indexing: 5000-15000 docs/sec - Bulk indexing: 1000+ docs/sec
- Concurrent requests: 100+ - Concurrent requests: 50+
### Redis Performance Testing
```bash
# Benchmark Redis operations
redis-benchmark -t set,get -n 100000 -q
# Test with pipeline
redis-benchmark -t set,get -n 100000 -q -P 16
```
--- ---
@@ -374,12 +339,6 @@ redis-benchmark -t set,get -n 100000 -q -P 16
- [ ] Error message information disclosure - [ ] Error message information disclosure
- [ ] Redis authentication (if enabled) - [ ] Redis authentication (if enabled)
### Redis Security Checklist
- [ ] Redis password configured (REDIS_PASSWORD)
- [ ] Redis not exposed to internet
- [ ] Firewall rules configured
- [ ] TLS/SSL enabled (if needed)
--- ---
## ✅ Pre-Production Checklist ## ✅ Pre-Production Checklist
@@ -388,8 +347,7 @@ Before deploying to production:
- [ ] All tests passing - [ ] All tests passing
- [ ] Environment variables configured - [ ] Environment variables configured
- [ ] Redis secured with password - [ ] Redis secured and backed up (RDB/AOF)
- [ ] Redis persistence configured (RDB/AOF)
- [ ] SSL/TLS certificates installed - [ ] SSL/TLS certificates installed
- [ ] Error logging configured - [ ] Error logging configured
- [ ] Monitoring set up - [ ] Monitoring set up
@@ -397,7 +355,6 @@ Before deploying to production:
- [ ] Security review done - [ ] Security review done
- [ ] Documentation reviewed - [ ] Documentation reviewed
- [ ] Backup strategy in place - [ ] Backup strategy in place
- [ ] Redis memory limits configured
--- ---
@@ -418,7 +375,6 @@ Before deploying to production:
- [ ] Hash search: PASS/FAIL - [ ] Hash search: PASS/FAIL
- [ ] Bulk indexing: PASS/FAIL - [ ] Bulk indexing: PASS/FAIL
- [ ] API endpoints: PASS/FAIL - [ ] API endpoints: PASS/FAIL
- [ ] Duplicate removal: PASS/FAIL
### Issues Found ### Issues Found
1. [Description] 1. [Description]
@@ -431,7 +387,6 @@ Before deploying to production:
- Average search time: - Average search time:
- Bulk index rate: - Bulk index rate:
- Concurrent users tested: - Concurrent users tested:
- Redis memory usage:
## Conclusion ## Conclusion
[Summary of testing] [Summary of testing]
@@ -447,8 +402,7 @@ After successful testing:
2. ✅ Fix any issues found 2. ✅ Fix any issues found
3. ✅ Perform load testing 3. ✅ Perform load testing
4. ✅ Review security 4. ✅ Review security
5.Configure Redis persistence 5.Prepare for deployment
6. ✅ Prepare for deployment
See [DEPLOYMENT.md](DEPLOYMENT.md) for deployment instructions. See [DEPLOYMENT.md](DEPLOYMENT.md) for deployment instructions.

Ver fichero

@@ -6,19 +6,24 @@ export async function GET() {
// Check Redis connection and get info // Check Redis connection and get info
const redisInfo = await getRedisInfo(); const redisInfo = await getRedisInfo();
// Get stats // Get index stats
const stats = await getStats(); const stats = await getStats();
return NextResponse.json({ return NextResponse.json({
status: 'ok', status: 'ok',
redis: { redis: {
connected: redisInfo.connected,
version: redisInfo.version, version: redisInfo.version,
memory: redisInfo.memory, usedMemory: redisInfo.usedMemory,
dbSize: redisInfo.dbSize dbSize: redisInfo.dbSize
}, },
index: {
exists: true,
name: INDEX_NAME,
stats: { stats: {
count: stats.count, documentCount: stats.count,
size: stats.size indexSize: stats.size
}
} }
}); });
} catch (error) { } catch (error) {

Ver fichero

@@ -1,8 +1,7 @@
'use client'; 'use client';
import { useState, useEffect, useCallback, Suspense } from 'react'; import { useState, useEffect } from 'react';
import { useSearchParams } from 'next/navigation'; import { Search, Copy, Check, Hash, Key, AlertCircle, Loader2, Database } from 'lucide-react';
import { Search, Copy, Check, Hash, Key, AlertCircle, Loader2, Database, Link } from 'lucide-react';
interface SearchResult { interface SearchResult {
found: boolean; found: boolean;
@@ -46,62 +45,13 @@ function formatNumber(num: number): string {
return num.toLocaleString(); return num.toLocaleString();
} }
function HasherContent() { export default function Home() {
const searchParams = useSearchParams();
const [query, setQuery] = useState(''); const [query, setQuery] = useState('');
const [result, setResult] = useState<SearchResult | null>(null); const [result, setResult] = useState<SearchResult | null>(null);
const [loading, setLoading] = useState(false); const [loading, setLoading] = useState(false);
const [error, setError] = useState(''); const [error, setError] = useState('');
const [copiedField, setCopiedField] = useState<string | null>(null); const [copiedField, setCopiedField] = useState<string | null>(null);
const [stats, setStats] = useState<IndexStats | null>(null); const [stats, setStats] = useState<IndexStats | null>(null);
const [copiedLink, setCopiedLink] = useState(false);
const [initialLoadDone, setInitialLoadDone] = useState(false);
const performSearch = useCallback(async (searchQuery: string, updateUrl: boolean = true) => {
if (!searchQuery.trim()) return;
setLoading(true);
setError('');
setResult(null);
try {
const response = await fetch('/api/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query: searchQuery.trim() })
});
if (!response.ok) {
throw new Error('Search failed');
}
const data = await response.json();
setResult(data);
// Update URL with search query (using history API to avoid re-triggering effects)
if (updateUrl) {
const newUrl = new URL(window.location.href);
newUrl.searchParams.set('q', searchQuery.trim());
window.history.replaceState(null, '', newUrl.pathname + newUrl.search);
}
} catch (_err) {
setError('Failed to perform search. Please check your connection.');
} finally {
setLoading(false);
}
}, []);
// Load query from URL on mount (only once)
useEffect(() => {
if (initialLoadDone) return;
const urlQuery = searchParams.get('q');
if (urlQuery) {
setQuery(urlQuery);
performSearch(urlQuery, false);
}
setInitialLoadDone(true);
}, [searchParams, performSearch, initialLoadDone]);
useEffect(() => { useEffect(() => {
const fetchStats = async () => { const fetchStats = async () => {
@@ -123,7 +73,30 @@ function HasherContent() {
const handleSearch = async (e: React.FormEvent) => { const handleSearch = async (e: React.FormEvent) => {
e.preventDefault(); e.preventDefault();
performSearch(query); if (!query.trim()) return;
setLoading(true);
setError('');
setResult(null);
try {
const response = await fetch('/api/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query: query.trim() })
});
if (!response.ok) {
throw new Error('Search failed');
}
const data = await response.json();
setResult(data);
} catch (_err) {
setError('Failed to perform search. Please check your connection.');
} finally {
setLoading(false);
}
}; };
const copyToClipboard = (text: string, field: string) => { const copyToClipboard = (text: string, field: string) => {
@@ -132,14 +105,6 @@ function HasherContent() {
setTimeout(() => setCopiedField(null), 2000); setTimeout(() => setCopiedField(null), 2000);
}; };
const copyShareLink = () => {
const url = new URL(window.location.href);
url.searchParams.set('q', query.trim());
navigator.clipboard.writeText(url.toString());
setCopiedLink(true);
setTimeout(() => setCopiedLink(false), 2000);
};
const HashDisplay = ({ label, value, field }: { label: string; value: string; field: string }) => ( const HashDisplay = ({ label, value, field }: { label: string; value: string; field: string }) => (
<div className="bg-gray-50 rounded-lg p-4 border border-gray-200"> <div className="bg-gray-50 rounded-lg p-4 border border-gray-200">
<div className="flex items-center justify-between mb-2"> <div className="flex items-center justify-between mb-2">
@@ -201,27 +166,12 @@ function HasherContent() {
value={query} value={query}
onChange={(e) => setQuery(e.target.value)} onChange={(e) => setQuery(e.target.value)}
placeholder="Enter a hash or plaintext..." placeholder="Enter a hash or plaintext..."
className="w-full px-6 py-4 pr-28 text-lg rounded-2xl border-2 border-gray-200 focus:border-blue-500 focus:ring-4 focus:ring-blue-100 outline-none transition-all shadow-sm" className="w-full px-6 py-4 pr-14 text-lg rounded-2xl border-2 border-gray-200 focus:border-blue-500 focus:ring-4 focus:ring-blue-100 outline-none transition-all shadow-sm"
/> />
<div className="absolute right-2 top-1/2 -translate-y-1/2 flex gap-1">
{query.trim() && (
<button
type="button"
onClick={copyShareLink}
className="bg-gray-100 text-gray-600 p-3 rounded-xl hover:bg-gray-200 transition-all"
title="Copy share link"
>
{copiedLink ? (
<Check className="w-6 h-6 text-green-600" />
) : (
<Link className="w-6 h-6" />
)}
</button>
)}
<button <button
type="submit" type="submit"
disabled={loading || !query.trim()} disabled={loading || !query.trim()}
className="bg-gradient-to-r from-blue-600 to-purple-600 text-white p-3 rounded-xl hover:shadow-lg disabled:opacity-50 disabled:cursor-not-allowed transition-all" className="absolute right-2 top-1/2 -translate-y-1/2 bg-gradient-to-r from-blue-600 to-purple-600 text-white p-3 rounded-xl hover:shadow-lg disabled:opacity-50 disabled:cursor-not-allowed transition-all"
> >
{loading ? ( {loading ? (
<Loader2 className="w-6 h-6 animate-spin" /> <Loader2 className="w-6 h-6 animate-spin" />
@@ -230,7 +180,6 @@ function HasherContent() {
)} )}
</button> </button>
</div> </div>
</div>
</form> </form>
{/* Error Message */} {/* Error Message */}
@@ -366,19 +315,3 @@ function HasherContent() {
); );
} }
function LoadingFallback() {
return (
<div className="min-h-screen bg-gradient-to-br from-blue-50 via-white to-purple-50 flex items-center justify-center">
<Loader2 className="w-12 h-12 text-blue-600 animate-spin" />
</div>
);
}
export default function Home() {
return (
<Suspense fallback={<LoadingFallback />}>
<HasherContent />
</Suspense>
);
}

Ver fichero

@@ -11,7 +11,7 @@ export interface HashResult {
/** /**
* Generate all common hashes for a given plaintext * Generate all common hashes for a given plaintext
*/ */
export function generateHashes(plaintext: string): HashResult { export async function generateHashes(plaintext: string): Promise<HashResult> {
return { return {
plaintext, plaintext,
md5: crypto.createHash('md5').update(plaintext).digest('hex'), md5: crypto.createHash('md5').update(plaintext).digest('hex'),

Ver fichero

@@ -103,79 +103,76 @@ export async function findByHash(algorithm: string, hash: string): Promise<HashD
} }
/** /**
* Check if a plaintext or any of its hashes exist * Check if plaintext or any of its hashes exist
*/ */
export async function checkExistence(plaintext: string, hashes?: { export async function checkExistence(plaintext: string, hashes: {
md5?: string; md5: string;
sha1?: string; sha1: string;
sha256?: string; sha256: string;
sha512?: string; sha512: string;
}): Promise<boolean> { }): Promise<boolean> {
// Check if plaintext exists
const plaintextKey = `hash:plaintext:${plaintext}`;
const exists = await redisClient.exists(plaintextKey);
if (exists) return true;
// Check if any hash exists
if (hashes) {
const pipeline = redisClient.pipeline(); const pipeline = redisClient.pipeline();
if (hashes.md5) pipeline.exists(`hash:index:md5:${hashes.md5}`);
if (hashes.sha1) pipeline.exists(`hash:index:sha1:${hashes.sha1}`); pipeline.exists(`hash:plaintext:${plaintext}`);
if (hashes.sha256) pipeline.exists(`hash:index:sha256:${hashes.sha256}`); pipeline.exists(`hash:index:md5:${hashes.md5}`);
if (hashes.sha512) pipeline.exists(`hash:index:sha512:${hashes.sha512}`); pipeline.exists(`hash:index:sha1:${hashes.sha1}`);
pipeline.exists(`hash:index:sha256:${hashes.sha256}`);
pipeline.exists(`hash:index:sha512:${hashes.sha512}`);
const results = await pipeline.exec(); const results = await pipeline.exec();
if (results && results.some(([_err, result]) => result === 1)) {
return true;
}
}
return false; if (!results) return false;
// Check if any key exists
return results.some(([err, value]) => !err && value === 1);
} }
/** /**
* Get database statistics * Get index statistics
*/ */
export async function getStats(): Promise<{ count: number; size: number }> { export async function getStats(): Promise<{ count: number; size: number }> {
const stats = await redisClient.hgetall('hash:stats'); const stats = await redisClient.hgetall('hash:stats');
return { return {
count: parseInt(stats.count || '0', 10), count: parseInt(stats.count || '0', 10),
size: parseInt(stats.size || '0', 10), size: parseInt(stats.size || '0', 10)
}; };
} }
/** /**
* Get Redis server info * Initialize Redis (compatibility function, Redis doesn't need explicit initialization)
*/
export async function initializeRedis(): Promise<void> {
// Check connection
await redisClient.ping();
console.log('Redis initialized successfully');
}
/**
* Get Redis info for health check
*/ */
export async function getRedisInfo(): Promise<{ export async function getRedisInfo(): Promise<{
connected: boolean;
version: string; version: string;
memory: string; usedMemory: number;
dbSize: number; dbSize: number;
}> { }> {
const info = await redisClient.info('server'); const info = await redisClient.info('server');
const memory = await redisClient.info('memory'); const memory = await redisClient.info('memory');
const dbSize = await redisClient.dbsize(); const dbSize = await redisClient.dbsize();
const versionMatch = info.match(/redis_version:([^\r\n]+)/); // Parse Redis info string
const memoryMatch = memory.match(/used_memory_human:([^\r\n]+)/); const parseInfo = (infoStr: string, key: string): string => {
const match = infoStr.match(new RegExp(`${key}:(.+)`));
return match ? match[1].trim() : 'unknown';
};
return { return {
version: versionMatch ? versionMatch[1] : 'unknown', connected: redisClient.status === 'ready',
memory: memoryMatch ? memoryMatch[1] : 'unknown', version: parseInfo(info, 'redis_version'),
dbSize, usedMemory: parseInt(parseInfo(memory, 'used_memory'), 10) || 0,
dbSize
}; };
} }
/** export { REDIS_HOST, REDIS_PORT };
* Initialize Redis connection (just verify it's working)
*/
export async function initializeRedis(): Promise<void> {
try {
await redisClient.ping();
console.log('Redis connection verified');
} catch (error) {
console.error('Error connecting to Redis:', error);
throw error;
}
}

Ver fichero

@@ -20,15 +20,16 @@
*/ */
import Redis from 'ioredis'; import Redis from 'ioredis';
import { createReadStream, existsSync, readFileSync, writeFileSync, unlinkSync, openSync, readSync, closeSync } from 'fs'; import { createReadStream, existsSync, readFileSync, writeFileSync, unlinkSync } from 'fs';
import { resolve, basename } from 'path'; import { resolve, basename } from 'path';
import { createInterface } from 'readline'; import { createInterface } from 'readline';
import * as crypto from 'crypto'; import crypto from 'crypto';
const REDIS_HOST = process.env.REDIS_HOST || 'localhost'; const REDIS_HOST = process.env.REDIS_HOST || 'localhost';
const REDIS_PORT = parseInt(process.env.REDIS_PORT || '6379', 10); const REDIS_PORT = parseInt(process.env.REDIS_PORT || '6379', 10);
const REDIS_PASSWORD = process.env.REDIS_PASSWORD || undefined; const REDIS_PASSWORD = process.env.REDIS_PASSWORD || undefined;
const REDIS_DB = parseInt(process.env.REDIS_DB || '0', 10); const REDIS_DB = parseInt(process.env.REDIS_DB || '0', 10);
const INDEX_NAME = 'hasher';
const DEFAULT_BATCH_SIZE = 100; const DEFAULT_BATCH_SIZE = 100;
interface HashDocument { interface HashDocument {
@@ -89,12 +90,13 @@ function parseArgs(args: string[]): ParsedArgs {
result.batchSize = parsed; result.batchSize = parsed;
} }
} else if (arg === '--batch-size') { } else if (arg === '--batch-size') {
// Support --batch-size <value> format
const nextArg = args[i + 1]; const nextArg = args[i + 1];
if (nextArg && !nextArg.startsWith('-')) { if (nextArg && !nextArg.startsWith('-')) {
const parsed = parseInt(nextArg, 10); const parsed = parseInt(nextArg, 10);
if (!isNaN(parsed) && parsed > 0) { if (!isNaN(parsed) && parsed > 0) {
result.batchSize = parsed; result.batchSize = parsed;
i++; i++; // Skip next argument
} }
} }
} else if (arg.startsWith('--state-file=')) { } else if (arg.startsWith('--state-file=')) {
@@ -106,6 +108,7 @@ function parseArgs(args: string[]): ParsedArgs {
i++; i++;
} }
} else if (!arg.startsWith('-')) { } else if (!arg.startsWith('-')) {
// Positional argument - treat as file path
result.filePath = arg; result.filePath = arg;
} }
} }
@@ -113,7 +116,50 @@ function parseArgs(args: string[]): ParsedArgs {
return result; return result;
} }
function generateHashes(plaintext: string): HashDocument { function getFileHash(filePath: string): string {
// Create a hash based on file path and size for quick identification
const stats = require('fs').statSync(filePath);
const hashInput = `${filePath}:${stats.size}:${stats.mtime.getTime()}`;
return crypto.createHash('md5').update(hashInput).digest('hex').substring(0, 8);
}
function getDefaultStateFile(filePath: string): string {
const fileName = basename(filePath).replace(/\.[^.]+$/, '');
return resolve(`.indexer-state-${fileName}.json`);
}
function loadState(stateFile: string): IndexerState | null {
try {
if (existsSync(stateFile)) {
const data = readFileSync(stateFile, 'utf-8');
return JSON.parse(data) as IndexerState;
}
} catch (error) {
console.warn(`⚠️ Could not load state file: ${error}`);
}
return null;
}
function saveState(stateFile: string, state: IndexerState): void {
try {
state.lastUpdate = new Date().toISOString();
writeFileSync(stateFile, JSON.stringify(state, null, 2), 'utf-8');
} catch (error) {
console.error(`❌ Could not save state file: ${error}`);
}
}
function deleteState(stateFile: string): void {
try {
if (existsSync(stateFile)) {
unlinkSync(stateFile);
}
} catch (error) {
console.warn(`⚠️ Could not delete state file: ${error}`);
}
}
async function generateHashes(plaintext: string): Promise<HashDocument> {
return { return {
plaintext, plaintext,
md5: crypto.createHash('md5').update(plaintext).digest('hex'), md5: crypto.createHash('md5').update(plaintext).digest('hex'),
@@ -148,181 +194,78 @@ Environment Variables:
REDIS_DB Redis database number (default: 0) REDIS_DB Redis database number (default: 0)
Examples: Examples:
# Index a file with default settings npx tsx scripts/index-file.ts wordlist.txt
npm run index-file -- wordlist.txt npx tsx scripts/index-file.ts wordlist.txt --batch-size=500
npx tsx scripts/index-file.ts wordlist.txt --batch-size 500
npx tsx scripts/index-file.ts wordlist.txt --no-resume
npx tsx scripts/index-file.ts wordlist.txt --no-check
npm run index-file -- wordlist.txt --batch-size=500 --no-check
# Index with custom batch size State Management:
npm run index-file -- wordlist.txt --batch-size=500 The script automatically saves progress to a state file. If interrupted,
it will resume from where it left off on the next run. Use --no-resume
to start fresh.
# Start fresh (ignore previous state) Duplicate Checking:
npm run index-file -- wordlist.txt --no-resume By default, the script checks if each plaintext or hash already exists
in the index before inserting. Use --no-check to skip this verification
# Skip duplicate checking for speed for faster indexing (useful when you're sure there are no duplicates).
npm run index-file -- wordlist.txt --no-check
`); `);
process.exit(0);
} }
function computeFileHash(filePath: string): string { async function indexFile(filePath: string, batchSize: number, shouldResume: boolean, checkDuplicates: boolean, customStateFile: string | null) {
// Use streaming for large files to avoid memory issues
const hash = crypto.createHash('sha256');
const input = createReadStream(filePath, { highWaterMark: 64 * 1024 }); // 64KB chunks
let buffer = Buffer.alloc(0);
const fd = openSync(filePath, 'r');
const chunkSize = 64 * 1024; // 64KB
const readBuffer = Buffer.alloc(chunkSize);
try {
let bytesRead;
do {
bytesRead = readSync(fd, readBuffer, 0, chunkSize, null);
if (bytesRead > 0) {
hash.update(readBuffer.subarray(0, bytesRead));
}
} while (bytesRead > 0);
} finally {
closeSync(fd);
}
return hash.digest('hex');
}
function getStateFilePath(filePath: string, customPath: string | null): string {
if (customPath) {
return resolve(customPath);
}
const fileName = basename(filePath);
return resolve(`.indexer-state-${fileName}.json`);
}
function loadState(stateFilePath: string): IndexerState | null {
if (!existsSync(stateFilePath)) {
return null;
}
try {
const data = readFileSync(stateFilePath, 'utf-8');
return JSON.parse(data);
} catch (error) {
console.warn(`⚠️ Could not load state file: ${error}`);
return null;
}
}
function saveState(stateFilePath: string, state: IndexerState): void {
try {
writeFileSync(stateFilePath, JSON.stringify(state, null, 2), 'utf-8');
} catch (error) {
console.error(`❌ Could not save state file: ${error}`);
}
}
function deleteState(stateFilePath: string): void {
try {
if (existsSync(stateFilePath)) {
unlinkSync(stateFilePath);
}
} catch (error) {
console.warn(`⚠️ Could not delete state file: ${error}`);
}
}
async function countLines(filePath: string): Promise<number> {
return new Promise((resolve, reject) => {
let lineCount = 0;
const rl = createInterface({
input: createReadStream(filePath),
crlfDelay: Infinity
});
rl.on('line', () => lineCount++);
rl.on('close', () => resolve(lineCount));
rl.on('error', reject);
});
}
async function main() {
const args = process.argv.slice(2);
const parsed = parseArgs(args);
if (parsed.showHelp || !parsed.filePath) {
showHelp();
process.exit(parsed.showHelp ? 0 : 1);
}
const filePath = parsed.filePath!;
const batchSize = parsed.batchSize;
const checkDuplicates = parsed.checkDuplicates;
const absolutePath = resolve(filePath);
if (!existsSync(absolutePath)) {
console.error(`❌ File not found: ${absolutePath}`);
process.exit(1);
}
const stateFile = getStateFilePath(filePath, parsed.stateFile);
const fileHash = computeFileHash(absolutePath);
let state: IndexerState;
let resumingFrom = 0;
if (parsed.resume) {
const loadedState = loadState(stateFile);
if (loadedState && loadedState.fileHash === fileHash) {
state = loadedState;
resumingFrom = state.lastProcessedLine;
console.log(`📂 Resuming from previous state: ${stateFile}`);
} else {
if (loadedState) {
console.log('⚠️ File has changed or state file is from a different file. Starting fresh.');
}
state = {
filePath: absolutePath,
fileHash,
lastProcessedLine: 0,
totalLines: 0,
indexed: 0,
skipped: 0,
errors: 0,
startTime: Date.now(),
lastUpdate: new Date().toISOString()
};
}
} else {
deleteState(stateFile);
state = {
filePath: absolutePath,
fileHash,
lastProcessedLine: 0,
totalLines: 0,
indexed: 0,
skipped: 0,
errors: 0,
startTime: Date.now(),
lastUpdate: new Date().toISOString()
};
}
if (state.totalLines === 0) {
console.log('🔢 Counting lines...');
state.totalLines = await countLines(absolutePath);
}
const client = new Redis({ const client = new Redis({
host: REDIS_HOST, host: REDIS_HOST,
port: REDIS_PORT, port: REDIS_PORT,
password: REDIS_PASSWORD, password: REDIS_PASSWORD,
db: REDIS_DB, db: REDIS_DB,
retryStrategy: (times) => Math.min(times * 50, 2000),
}); });
console.log(''); const absolutePath = resolve(filePath);
console.log('📚 Hasher Indexer'); const stateFile = customStateFile || getDefaultStateFile(absolutePath);
console.log('━'.repeat(42)); const fileHash = getFileHash(absolutePath);
console.log(`Redis: ${REDIS_HOST}:${REDIS_PORT}`);
// State management
let state: IndexerState = {
filePath: absolutePath,
fileHash,
lastProcessedLine: 0,
totalLines: 0,
indexed: 0,
skipped: 0,
errors: 0,
startTime: Date.now(),
lastUpdate: new Date().toISOString()
};
// Check for existing state
const existingState = loadState(stateFile);
let resumingFrom = 0;
if (shouldResume && existingState) {
if (existingState.fileHash === fileHash) {
state = existingState;
resumingFrom = state.lastProcessedLine;
state.startTime = Date.now(); // Reset start time for this session
console.log(`📂 Found existing state, resuming from line ${resumingFrom}`);
} else {
console.log(`⚠️ File has changed since last run, starting fresh`);
deleteState(stateFile);
}
} else if (!shouldResume) {
deleteState(stateFile);
}
console.log(`📚 Hasher Indexer`);
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`Redis: ${REDIS_HOST}:${REDIS_PORT} (DB ${REDIS_DB})`);
console.log(`Index: ${INDEX_NAME}`);
console.log(`File: ${filePath}`); console.log(`File: ${filePath}`);
console.log(`Batch size: ${batchSize}`); console.log(`Batch size: ${batchSize}`);
console.log(`Duplicate check: ${checkDuplicates ? 'enabled' : 'disabled (--no-check)'}`); console.log(`Check duplicates: ${checkDuplicates ? 'yes' : 'no (--no-check)'}`);
console.log(`State file: ${stateFile}`);
if (resumingFrom > 0) { if (resumingFrom > 0) {
console.log(`Resuming from: line ${resumingFrom}`); console.log(`Resuming from: line ${resumingFrom}`);
console.log(`Already indexed: ${state.indexed}`); console.log(`Already indexed: ${state.indexed}`);
@@ -330,6 +273,7 @@ async function main() {
} }
console.log(''); console.log('');
// Handle interruption signals
let isInterrupted = false; let isInterrupted = false;
const handleInterrupt = () => { const handleInterrupt = () => {
if (isInterrupted) { if (isInterrupted) {
@@ -341,6 +285,7 @@ async function main() {
saveState(stateFile, state); saveState(stateFile, state);
console.log(`💾 State saved to ${stateFile}`); console.log(`💾 State saved to ${stateFile}`);
console.log(` Resume with: npx tsx scripts/index-file.ts ${filePath}`); console.log(` Resume with: npx tsx scripts/index-file.ts ${filePath}`);
console.log(` Or start fresh with: npx tsx scripts/index-file.ts ${filePath} --no-resume`);
process.exit(0); process.exit(0);
}; };
@@ -348,11 +293,13 @@ async function main() {
process.on('SIGTERM', handleInterrupt); process.on('SIGTERM', handleInterrupt);
try { try {
// Test connection
console.log('🔗 Connecting to Redis...'); console.log('🔗 Connecting to Redis...');
await client.ping(); await client.ping();
console.log('✅ Connected successfully\n'); console.log('✅ Connected successfully\n');
console.log('📖 Reading file...\n'); // Process file line by line using streams
console.log('📖 Processing file...\n');
let currentLineNumber = 0; let currentLineNumber = 0;
let currentBatch: string[] = []; let currentBatch: string[] = [];
@@ -367,45 +314,78 @@ async function main() {
crlfDelay: Infinity crlfDelay: Infinity
}); });
const processBatch = async (batch: string[]) => { const processBatch = async (batch: string[], lineNumber: number) => {
if (batch.length === 0 || isInterrupted) return; if (batch.length === 0) return;
if (isInterrupted) return;
const batchWithHashes = batch.map(plaintext => generateHashes(plaintext)); // Generate hashes for all items in batch first
const batchWithHashes = await Promise.all(
let toIndex = batchWithHashes; batch.map(async (plaintext: string) => ({
plaintext,
if (checkDuplicates) { hashes: await generateHashes(plaintext)
const existenceChecks = await Promise.all( }))
batchWithHashes.map(doc => client.exists(`hash:plaintext:${doc.plaintext}`))
); );
const newDocs = batchWithHashes.filter((_doc, idx) => existenceChecks[idx] === 0); const pipeline = client.pipeline();
const existingCount = batchWithHashes.length - newDocs.length; let toIndex: typeof batchWithHashes = [];
state.skipped += existingCount; if (checkDuplicates) {
sessionSkipped += existingCount; // Check which items already exist
toIndex = newDocs; const existenceChecks = await Promise.all(
batchWithHashes.map(async (item) => {
const plaintextExists = await client.exists(`hash:plaintext:${item.plaintext}`);
if (plaintextExists) return { item, exists: true };
// Check if any hash exists
const md5Exists = await client.exists(`hash:index:md5:${item.hashes.md5}`);
const sha1Exists = await client.exists(`hash:index:sha1:${item.hashes.sha1}`);
const sha256Exists = await client.exists(`hash:index:sha256:${item.hashes.sha256}`);
const sha512Exists = await client.exists(`hash:index:sha512:${item.hashes.sha512}`);
return {
item,
exists: md5Exists || sha1Exists || sha256Exists || sha512Exists
};
})
);
for (const check of existenceChecks) {
if (check.exists) {
state.skipped++;
sessionSkipped++;
} else {
toIndex.push(check.item);
}
}
} else {
// No duplicate checking - index everything
toIndex = batchWithHashes;
} }
// Execute bulk operations
if (toIndex.length > 0) { if (toIndex.length > 0) {
const pipeline = client.pipeline(); try {
for (const item of toIndex) {
for (const doc of toIndex) { const doc = item.hashes;
const key = `hash:plaintext:${doc.plaintext}`; const key = `hash:plaintext:${doc.plaintext}`;
// Store main document
pipeline.set(key, JSON.stringify(doc)); pipeline.set(key, JSON.stringify(doc));
// Create indexes for each hash type
pipeline.set(`hash:index:md5:${doc.md5}`, doc.plaintext); pipeline.set(`hash:index:md5:${doc.md5}`, doc.plaintext);
pipeline.set(`hash:index:sha1:${doc.sha1}`, doc.plaintext); pipeline.set(`hash:index:sha1:${doc.sha1}`, doc.plaintext);
pipeline.set(`hash:index:sha256:${doc.sha256}`, doc.plaintext); pipeline.set(`hash:index:sha256:${doc.sha256}`, doc.plaintext);
pipeline.set(`hash:index:sha512:${doc.sha512}`, doc.plaintext); pipeline.set(`hash:index:sha512:${doc.sha512}`, doc.plaintext);
// Update statistics
pipeline.hincrby('hash:stats', 'count', 1); pipeline.hincrby('hash:stats', 'count', 1);
pipeline.hincrby('hash:stats', 'size', JSON.stringify(doc).length); pipeline.hincrby('hash:stats', 'size', JSON.stringify(doc).length);
} }
const results = await pipeline.exec(); const results = await pipeline.exec();
// Count errors
const errorCount = results?.filter(([err]) => err !== null).length || 0; const errorCount = results?.filter(([err]) => err !== null).length || 0;
if (errorCount > 0) { if (errorCount > 0) {
@@ -418,77 +398,124 @@ async function main() {
state.indexed += toIndex.length; state.indexed += toIndex.length;
sessionIndexed += toIndex.length; sessionIndexed += toIndex.length;
} }
} catch (error) {
console.error(`\n❌ Error processing batch:`, error);
state.errors += toIndex.length;
sessionErrors += toIndex.length;
}
} }
state.lastUpdate = new Date().toISOString(); // Update state
state.lastProcessedLine = lineNumber;
const progress = ((state.lastProcessedLine / state.totalLines) * 100).toFixed(1); state.totalLines = lineNumber;
process.stdout.write(
`\r⏳ Progress: ${state.lastProcessedLine}/${state.totalLines} (${progress}%) - ` +
`Indexed: ${sessionIndexed}, Skipped: ${sessionSkipped}, Errors: ${sessionErrors} `
);
// Save state periodically (every 10 batches)
if (lineNumber % (batchSize * 10) === 0) {
saveState(stateFile, state); saveState(stateFile, state);
}
// Progress indicator
const elapsed = ((Date.now() - sessionStartTime) / 1000).toFixed(0);
process.stdout.write(`\r⏳ Line: ${lineNumber} | Session: +${sessionIndexed} indexed, +${sessionSkipped} skipped | Total: ${state.indexed} indexed | Time: ${elapsed}s`);
}; };
for await (const line of rl) { for await (const line of rl) {
if (isInterrupted) break;
currentLineNumber++; currentLineNumber++;
// Skip already processed lines
if (currentLineNumber <= resumingFrom) { if (currentLineNumber <= resumingFrom) {
continue; continue;
} }
if (isInterrupted) break; const trimmedLine = line.trim();
if (trimmedLine.length > 0) {
const trimmed = line.trim(); // Only take first word (no spaces or separators)
if (!trimmed) continue; const firstWord = trimmedLine.split(/\s+/)[0];
if (firstWord) {
currentBatch.push(trimmed); currentBatch.push(firstWord);
state.lastProcessedLine = currentLineNumber;
if (currentBatch.length >= batchSize) { if (currentBatch.length >= batchSize) {
await processBatch(currentBatch); await processBatch(currentBatch, currentLineNumber);
currentBatch = []; currentBatch = [];
} }
} }
}
}
// Process remaining items in last batch
if (currentBatch.length > 0 && !isInterrupted) { if (currentBatch.length > 0 && !isInterrupted) {
await processBatch(currentBatch); await processBatch(currentBatch, currentLineNumber);
} }
console.log('\n'); if (isInterrupted) {
return;
}
if (!isInterrupted) { // No refresh needed for Redis
const totalTime = ((Date.now() - sessionStartTime) / 1000).toFixed(2); console.log('\n\n✅ All data persisted to Redis');
const rate = (sessionIndexed / parseFloat(totalTime)).toFixed(2);
console.log('━'.repeat(42));
console.log('✅ Indexing complete!');
console.log('');
console.log('📊 Session Statistics:');
console.log(` Indexed: ${sessionIndexed}`);
console.log(` Skipped: ${sessionSkipped}`);
console.log(` Errors: ${sessionErrors}`);
console.log(` Time: ${totalTime}s`);
console.log(` Rate: ${rate} docs/sec`);
console.log('');
console.log('📈 Total Statistics:');
console.log(` Total indexed: ${state.indexed}`);
console.log(` Total skipped: ${state.skipped}`);
console.log(` Total errors: ${state.errors}`);
console.log('');
// Delete state file on successful completion
deleteState(stateFile); deleteState(stateFile);
}
await client.quit(); const duration = ((Date.now() - sessionStartTime) / 1000).toFixed(2);
const rate = sessionIndexed > 0 ? (sessionIndexed / parseFloat(duration)).toFixed(0) : '0';
console.log('\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━');
console.log('✅ Indexing complete!');
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`Total lines processed: ${currentLineNumber}`);
if (resumingFrom > 0) {
console.log(`Lines skipped (resumed): ${resumingFrom}`);
console.log(`Lines processed this session: ${currentLineNumber - resumingFrom}`);
}
console.log(`Successfully indexed (total): ${state.indexed}`);
console.log(`Successfully indexed (session): ${sessionIndexed}`);
console.log(`Skipped duplicates (total): ${state.skipped}`);
console.log(`Skipped duplicates (session): ${sessionSkipped}`);
console.log(`Errors (total): ${state.errors}`);
console.log(`Session duration: ${duration}s`);
console.log(`Session rate: ${rate} docs/sec`);
console.log('');
} catch (error) { } catch (error) {
console.error('\n\n❌ Error:', error); // Save state on error
saveState(stateFile, state); saveState(stateFile, state);
console.log(`💾 State saved to ${stateFile}`); console.error(`\n💾 State saved to ${stateFile}`);
await client.quit(); console.error('❌ Error:', error instanceof Error ? error.message : error);
process.exit(1); process.exit(1);
} finally {
// Remove signal handlers
process.removeListener('SIGINT', handleInterrupt);
process.removeListener('SIGTERM', handleInterrupt);
} }
} }
main(); // Parse command line arguments
const args = process.argv.slice(2);
const parsedArgs = parseArgs(args);
if (parsedArgs.showHelp || !parsedArgs.filePath) {
showHelp();
}
const filePath = parsedArgs.filePath as string;
// Validate file exists
if (!existsSync(filePath)) {
console.error(`❌ File not found: ${filePath}`);
process.exit(1);
}
console.log(`\n🔧 Configuration:`);
console.log(` File: ${filePath}`);
console.log(` Batch size: ${parsedArgs.batchSize}`);
console.log(` Resume: ${parsedArgs.resume}`);
console.log(` Check duplicates: ${parsedArgs.checkDuplicates}`);
if (parsedArgs.stateFile) {
console.log(` State file: ${parsedArgs.stateFile}`);
}
console.log('');
indexFile(filePath, parsedArgs.batchSize, parsedArgs.resume, parsedArgs.checkDuplicates, parsedArgs.stateFile).catch(console.error);

Ver fichero

@@ -13,8 +13,7 @@
* Options: * Options:
* --dry-run Show duplicates without removing them (default) * --dry-run Show duplicates without removing them (default)
* --execute Actually remove the duplicates * --execute Actually remove the duplicates
* --batch-size=<number> Number of keys to scan in each batch (default: 1000) * --field=<field> Check duplicates only on this field (plaintext, md5, sha1, sha256, sha512)
* --field=<field> Check duplicates only on this field (md5, sha1, sha256, sha512)
* --help, -h Show this help message * --help, -h Show this help message
*/ */
@@ -24,20 +23,10 @@ const REDIS_HOST = process.env.REDIS_HOST || 'localhost';
const REDIS_PORT = parseInt(process.env.REDIS_PORT || '6379', 10); const REDIS_PORT = parseInt(process.env.REDIS_PORT || '6379', 10);
const REDIS_PASSWORD = process.env.REDIS_PASSWORD || undefined; const REDIS_PASSWORD = process.env.REDIS_PASSWORD || undefined;
const REDIS_DB = parseInt(process.env.REDIS_DB || '0', 10); const REDIS_DB = parseInt(process.env.REDIS_DB || '0', 10);
const DEFAULT_BATCH_SIZE = 1000; const INDEX_NAME = 'hasher';
interface HashDocument {
plaintext: string;
md5: string;
sha1: string;
sha256: string;
sha512: string;
created_at: string;
}
interface ParsedArgs { interface ParsedArgs {
dryRun: boolean; dryRun: boolean;
batchSize: number;
field: string | null; field: string | null;
showHelp: boolean; showHelp: boolean;
} }
@@ -50,10 +39,18 @@ interface DuplicateGroup {
deletePlaintexts: string[]; deletePlaintexts: string[];
} }
interface HashDocument {
plaintext: string;
md5: string;
sha1: string;
sha256: string;
sha512: string;
created_at: string;
}
function parseArgs(args: string[]): ParsedArgs { function parseArgs(args: string[]): ParsedArgs {
const result: ParsedArgs = { const result: ParsedArgs = {
dryRun: true, dryRun: true,
batchSize: DEFAULT_BATCH_SIZE,
field: null, field: null,
showHelp: false showHelp: false
}; };
@@ -67,21 +64,6 @@ function parseArgs(args: string[]): ParsedArgs {
result.dryRun = true; result.dryRun = true;
} else if (arg === '--execute') { } else if (arg === '--execute') {
result.dryRun = false; result.dryRun = false;
} else if (arg.startsWith('--batch-size=')) {
const value = arg.split('=')[1];
const parsed = parseInt(value, 10);
if (!isNaN(parsed) && parsed > 0) {
result.batchSize = parsed;
}
} else if (arg === '--batch-size') {
const nextArg = args[i + 1];
if (nextArg && !nextArg.startsWith('-')) {
const parsed = parseInt(nextArg, 10);
if (!isNaN(parsed) && parsed > 0) {
result.batchSize = parsed;
i++;
}
}
} else if (arg.startsWith('--field=')) { } else if (arg.startsWith('--field=')) {
result.field = arg.split('=')[1]; result.field = arg.split('=')[1];
} else if (arg === '--field') { } else if (arg === '--field') {
@@ -107,9 +89,8 @@ Usage:
Options: Options:
--dry-run Show duplicates without removing them (default) --dry-run Show duplicates without removing them (default)
--execute Actually remove the duplicates --execute Actually remove the duplicates
--batch-size=<number> Number of keys to scan in each batch (default: 1000)
--field=<field> Check duplicates only on this field --field=<field> Check duplicates only on this field
Valid fields: md5, sha1, sha256, sha512 Valid fields: plaintext, md5, sha1, sha256, sha512
--help, -h Show this help message --help, -h Show this help message
Environment Variables: Environment Variables:
@@ -119,81 +100,90 @@ Environment Variables:
REDIS_DB Redis database number (default: 0) REDIS_DB Redis database number (default: 0)
Examples: Examples:
# Dry run (show duplicates only) npx tsx scripts/remove-duplicates.ts # Dry run, show all duplicates
npm run remove-duplicates npx tsx scripts/remove-duplicates.ts --execute # Remove all duplicates
npx tsx scripts/remove-duplicates.ts --field=md5 # Check only md5 duplicates
npx tsx scripts/remove-duplicates.ts --execute --field=plaintext
# Actually remove duplicates Notes:
npm run remove-duplicates -- --execute - The script keeps the OLDEST document (by created_at) and removes newer duplicates
- Always run with --dry-run first to review what will be deleted
# Check only MD5 duplicates - Duplicates are checked across all hash fields by default
npm run remove-duplicates -- --field=md5 --execute
Description:
This script scans through all hash documents in Redis and identifies
duplicates based on hash values. When duplicates are found, it keeps
the oldest entry (by created_at) and marks the rest for deletion.
`); `);
process.exit(0);
} }
async function findDuplicatesForField( async function findDuplicatesForField(
client: Redis, client: Redis,
field: 'md5' | 'sha1' | 'sha256' | 'sha512', field: string
batchSize: number
): Promise<DuplicateGroup[]> { ): Promise<DuplicateGroup[]> {
const pattern = `hash:index:${field}:*`; const duplicates: DuplicateGroup[] = [];
const hashToPlaintexts: Map<string, string[]> = new Map();
console.log(`🔍 Scanning ${field} indexes...`); console.log(` Scanning for ${field} duplicates...`);
let cursor = '0'; // Get all keys for this field type
let keysScanned = 0; const pattern = field === 'plaintext'
? 'hash:plaintext:*'
: `hash:index:${field}:*`;
do { const keys = await client.keys(pattern);
const [nextCursor, keys] = await client.scan(cursor, 'MATCH', pattern, 'COUNT', batchSize);
cursor = nextCursor;
keysScanned += keys.length;
// For hash indexes, group by hash value (not plaintext)
const valueMap = new Map<string, string[]>();
if (field === 'plaintext') {
// Each key is already unique for plaintext
// Check for same plaintext with different created_at
for (const key of keys) { for (const key of keys) {
const hash = key.replace(`hash:index:${field}:`, ''); const plaintext = key.replace('hash:plaintext:', '');
if (!valueMap.has(plaintext)) {
valueMap.set(plaintext, []);
}
valueMap.get(plaintext)!.push(plaintext);
}
} else {
// For hash fields, get the plaintext and check if multiple plaintexts have same hash
for (const key of keys) {
const hashValue = key.replace(`hash:index:${field}:`, '');
const plaintext = await client.get(key); const plaintext = await client.get(key);
if (plaintext) { if (plaintext) {
if (!hashToPlaintexts.has(hash)) { if (!valueMap.has(hashValue)) {
hashToPlaintexts.set(hash, []); valueMap.set(hashValue, []);
}
valueMap.get(hashValue)!.push(plaintext);
} }
hashToPlaintexts.get(hash)!.push(plaintext);
} }
} }
process.stdout.write(`\r Keys scanned: ${keysScanned} `); // Find groups with duplicates
} while (cursor !== '0'); for (const [value, plaintexts] of valueMap) {
const uniquePlaintexts = Array.from(new Set(plaintexts));
console.log(''); if (uniquePlaintexts.length > 1) {
// Get documents to compare timestamps
const docs: { plaintext: string; doc: HashDocument }[] = [];
const duplicates: DuplicateGroup[] = []; for (const plaintext of uniquePlaintexts) {
const docKey = `hash:plaintext:${plaintext}`;
const docData = await client.get(docKey);
if (docData) {
docs.push({ plaintext, doc: JSON.parse(docData) });
}
}
for (const [hash, plaintexts] of hashToPlaintexts.entries()) { // Sort by created_at (oldest first)
if (plaintexts.length > 1) { docs.sort((a, b) =>
// Fetch documents to get created_at timestamps new Date(a.doc.created_at).getTime() - new Date(b.doc.created_at).getTime()
const docs = await Promise.all(
plaintexts.map(async (pt) => {
const data = await client.get(`hash:plaintext:${pt}`);
return data ? JSON.parse(data) as HashDocument : null;
})
); );
const validDocs = docs.filter((doc): doc is HashDocument => doc !== null); if (docs.length > 1) {
if (validDocs.length > 1) {
// Sort by created_at, keep oldest
validDocs.sort((a, b) => a.created_at.localeCompare(b.created_at));
duplicates.push({ duplicates.push({
value: hash, value,
field, field,
plaintexts: validDocs.map(d => d.plaintext), plaintexts: docs.map(d => d.plaintext),
keepPlaintext: validDocs[0].plaintext, keepPlaintext: docs[0].plaintext,
deletePlaintexts: validDocs.slice(1).map(d => d.plaintext) deletePlaintexts: docs.slice(1).map(d => d.plaintext)
}); });
} }
} }
@@ -202,24 +192,106 @@ async function findDuplicatesForField(
return duplicates; return duplicates;
} }
async function removeDuplicates( async function removeDuplicates(parsedArgs: ParsedArgs) {
client: Redis, const client = new Redis({
duplicates: DuplicateGroup[], host: REDIS_HOST,
dryRun: boolean port: REDIS_PORT,
): Promise<{ deleted: number; errors: number }> { password: REDIS_PASSWORD,
db: REDIS_DB,
});
const fields = parsedArgs.field
? [parsedArgs.field]
: ['md5', 'sha1', 'sha256', 'sha512'];
console.log(`🔍 Hasher Duplicate Remover`);
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`Redis: ${REDIS_HOST}:${REDIS_PORT} (DB ${REDIS_DB})`);
console.log(`Index: ${INDEX_NAME}`);
console.log(`Mode: ${parsedArgs.dryRun ? '🔎 DRY RUN (no changes)' : '⚠️ EXECUTE (will delete)'}`);
console.log(`Fields to check: ${fields.join(', ')}`);
console.log('');
try {
// Test connection
console.log('🔗 Connecting to Redis...');
await client.ping();
console.log('✅ Connected successfully\n');
// Get index stats
const stats = await client.hgetall('hash:stats');
const totalCount = parseInt(stats.count || '0', 10);
console.log(`📊 Total documents in index: ${totalCount}\n`);
const allDuplicates: DuplicateGroup[] = [];
const seenPlaintexts = new Set<string>();
// Find duplicates for each field
for (const field of fields) {
console.log(`🔍 Checking duplicates for field: ${field}...`);
const fieldDuplicates = await findDuplicatesForField(client, field);
// Filter out already seen plaintexts
for (const dup of fieldDuplicates) {
const newDeletePlaintexts = dup.deletePlaintexts.filter(p => !seenPlaintexts.has(p));
if (newDeletePlaintexts.length > 0) {
dup.deletePlaintexts = newDeletePlaintexts;
newDeletePlaintexts.forEach(p => seenPlaintexts.add(p));
allDuplicates.push(dup);
}
}
console.log(` Found ${fieldDuplicates.length} duplicate groups for ${field}`);
}
const totalToDelete = allDuplicates.reduce((sum, dup) => sum + dup.deletePlaintexts.length, 0);
console.log(`\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`📋 Summary:`);
console.log(` Duplicate groups found: ${allDuplicates.length}`);
console.log(` Documents to delete: ${totalToDelete}`);
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n`);
if (allDuplicates.length === 0) {
console.log('✨ No duplicates found! Index is clean.\n');
await client.quit();
return;
}
// Show sample of duplicates
console.log(`📝 Sample duplicates (showing first 10):\n`);
const samplesToShow = allDuplicates.slice(0, 10);
for (const dup of samplesToShow) {
const truncatedValue = dup.value.length > 50
? dup.value.substring(0, 50) + '...'
: dup.value;
console.log(` Field: ${dup.field}`);
console.log(` Value: ${truncatedValue}`);
console.log(` Keep: ${dup.keepPlaintext}`);
console.log(` Delete: ${dup.deletePlaintexts.length} document(s)`);
console.log('');
}
if (allDuplicates.length > 10) {
console.log(` ... and ${allDuplicates.length - 10} more duplicate groups\n`);
}
if (parsedArgs.dryRun) {
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`🔎 DRY RUN - No changes made`);
console.log(` Run with --execute to remove ${totalToDelete} duplicate documents`);
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n`);
await client.quit();
return;
}
// Execute deletion
console.log(`\n🗑 Removing ${totalToDelete} duplicate documents...\n`);
let deleted = 0; let deleted = 0;
let errors = 0; let errors = 0;
console.log(''); for (const dup of allDuplicates) {
console.log(`${dryRun ? '🔍 DRY RUN - Would delete:' : '🗑️ Deleting duplicates...'}`);
console.log('');
for (const dup of duplicates) {
console.log(`Duplicate ${dup.field}: ${dup.value}`);
console.log(` Keep: ${dup.keepPlaintext} (oldest)`);
console.log(` Delete: ${dup.deletePlaintexts.join(', ')}`);
if (!dryRun) {
for (const plaintext of dup.deletePlaintexts) { for (const plaintext of dup.deletePlaintexts) {
try { try {
const docKey = `hash:plaintext:${plaintext}`; const docKey = `hash:plaintext:${plaintext}`;
@@ -229,7 +301,7 @@ async function removeDuplicates(
const doc: HashDocument = JSON.parse(docData); const doc: HashDocument = JSON.parse(docData);
const pipeline = client.pipeline(); const pipeline = client.pipeline();
// Delete the main document // Delete main document
pipeline.del(docKey); pipeline.del(docKey);
// Delete all indexes // Delete all indexes
@@ -250,101 +322,58 @@ async function removeDuplicates(
deleted++; deleted++;
} }
} }
process.stdout.write(`\r⏳ Progress: ${deleted + errors}/${totalToDelete} - Deleted: ${deleted}, Errors: ${errors}`);
} catch (error) { } catch (error) {
console.error(` Error deleting ${plaintext}:`, error); console.error(`\n❌ Error deleting ${plaintext}:`, error);
errors++; errors++;
} }
} }
} }
// Get new count
const newStats = await client.hgetall('hash:stats');
const newCount = parseInt(newStats.count || '0', 10);
console.log('\n\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━');
console.log('✅ Duplicate removal complete!');
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`Documents deleted: ${deleted}`);
console.log(`Errors: ${errors}`);
console.log(`Previous document count: ${totalCount}`);
console.log(`New document count: ${newCount}`);
console.log(''); console.log('');
}
return { deleted, errors };
}
async function main() {
const args = process.argv.slice(2);
const parsed = parseArgs(args);
if (parsed.showHelp) {
showHelp();
process.exit(0);
}
const validFields: Array<'md5' | 'sha1' | 'sha256' | 'sha512'> = ['md5', 'sha1', 'sha256', 'sha512'];
const fieldsToCheck = parsed.field
? [parsed.field as 'md5' | 'sha1' | 'sha256' | 'sha512']
: validFields;
// Validate field
if (parsed.field && !validFields.includes(parsed.field as any)) {
console.error(`❌ Invalid field: ${parsed.field}`);
console.error(` Valid fields: ${validFields.join(', ')}`);
process.exit(1);
}
const client = new Redis({
host: REDIS_HOST,
port: REDIS_PORT,
password: REDIS_PASSWORD,
db: REDIS_DB,
});
console.log('');
console.log('🔍 Hasher Duplicate Remover');
console.log('━'.repeat(42));
console.log(`Redis: ${REDIS_HOST}:${REDIS_PORT}`);
console.log(`Mode: ${parsed.dryRun ? 'DRY RUN' : 'EXECUTE'}`);
console.log(`Batch size: ${parsed.batchSize}`);
console.log(`Fields to check: ${fieldsToCheck.join(', ')}`);
console.log('');
try {
console.log('🔗 Connecting to Redis...');
await client.ping();
console.log('✅ Connected successfully\n');
const allDuplicates: DuplicateGroup[] = [];
for (const field of fieldsToCheck) {
const duplicates = await findDuplicatesForField(client, field, parsed.batchSize);
allDuplicates.push(...duplicates);
console.log(` Found ${duplicates.length} duplicate groups for ${field}`);
}
console.log('');
console.log(`📊 Total duplicate groups found: ${allDuplicates.length}`);
if (allDuplicates.length === 0) {
console.log('✅ No duplicates found!');
} else {
const totalToDelete = allDuplicates.reduce(
(sum, dup) => sum + dup.deletePlaintexts.length,
0
);
console.log(` Total documents to delete: ${totalToDelete}`);
const { deleted, errors } = await removeDuplicates(client, allDuplicates, parsed.dryRun);
if (!parsed.dryRun) {
console.log('━'.repeat(42));
console.log('✅ Removal complete!');
console.log('');
console.log('📊 Statistics:');
console.log(` Deleted: ${deleted}`);
console.log(` Errors: ${errors}`);
} else {
console.log('━'.repeat(42));
console.log('💡 This was a dry run. Use --execute to actually remove duplicates.');
}
}
await client.quit(); await client.quit();
} catch (error) { } catch (error) {
console.error('\n\n❌ Error:', error); console.error('\n❌ Error:', error instanceof Error ? error.message : error);
await client.quit();
process.exit(1); process.exit(1);
} }
} }
main(); // Parse command line arguments
const args = process.argv.slice(2);
const parsedArgs = parseArgs(args);
if (parsedArgs.showHelp) {
showHelp();
}
// Validate field if provided
const validFields = ['plaintext', 'md5', 'sha1', 'sha256', 'sha512'];
if (parsedArgs.field && !validFields.includes(parsedArgs.field)) {
console.error(`❌ Invalid field: ${parsedArgs.field}`);
console.error(` Valid fields: ${validFields.join(', ')}`);
process.exit(1);
}
console.log(`\n🔧 Configuration:`);
console.log(` Mode: ${parsedArgs.dryRun ? 'dry-run' : 'execute'}`);
if (parsedArgs.field) {
console.log(` Field: ${parsedArgs.field}`);
} else {
console.log(` Fields: all (md5, sha1, sha256, sha512)`);
}
console.log('');
removeDuplicates(parsedArgs).catch(console.error);