Comparar commits

..

1 Commits

Autor SHA1 Mensaje Fecha
ale
b91d19dc0b fix memory remove dup
Signed-off-by: ale <ale@manalejandro.com>
2025-12-21 22:36:31 +01:00
Se han modificado 19 ficheros con 988 adiciones y 1099 borrados

37
API.md
Ver fichero

@@ -102,7 +102,7 @@ Content-Type: application/json
} }
``` ```
Note: When plaintext is provided, it is automatically stored in Redis for future lookups. Note: When plaintext is provided, it is automatically indexed in Elasticsearch for future lookups.
#### Error Responses #### Error Responses
@@ -113,7 +113,7 @@ Note: When plaintext is provided, it is automatically stored in Redis for future
} }
``` ```
**500 Internal Server Error** - Server or Redis error: **500 Internal Server Error** - Server or Elasticsearch error:
```json ```json
{ {
"error": "Internal server error", "error": "Internal server error",
@@ -127,7 +127,7 @@ Note: When plaintext is provided, it is automatically stored in Redis for future
**Endpoint**: `GET /api/health` **Endpoint**: `GET /api/health`
**Description**: Check the health of the application and Redis connection. **Description**: Check the health of the application and Elasticsearch connection.
#### Request #### Request
@@ -139,28 +139,31 @@ No parameters required.
```json ```json
{ {
"status": "ok", "status": "ok",
"redis": { "elasticsearch": {
"version": "7.2.0", "cluster": "elasticsearch",
"memory": "1.5M", "status": "green"
"dbSize": 1542
}, },
"stats": { "index": {
"count": 1542, "exists": true,
"size": 524288 "name": "hasher",
"stats": {
"documentCount": 1542,
"indexSize": 524288
}
} }
} }
``` ```
**Redis status fields**: **Elasticsearch cluster status values**:
- `version`: Redis server version - `green`: All primary and replica shards are active
- `memory`: Memory used by Redis - `yellow`: All primary shards are active, but not all replicas
- `dbSize`: Total number of keys in database - `red`: Some primary shards are not active
**Error** (503 Service Unavailable): **Error** (503 Service Unavailable):
```json ```json
{ {
"status": "error", "status": "error",
"error": "Connection refused to Redis" "error": "Connection refused to Elasticsearch"
} }
``` ```
@@ -249,7 +252,7 @@ The API accepts requests from any origin by default. For production deployment,
## Notes ## Notes
- All timestamps are in ISO 8601 format - All timestamps are in ISO 8601 format
- The API automatically creates Redis keys with proper structure - The API automatically creates the Elasticsearch index if it doesn't exist
- Plaintext searches are automatically stored for future lookups - Plaintext searches are automatically indexed for future lookups
- Searches are case-insensitive - Searches are case-insensitive
- Hashes must be valid hexadecimal strings - Hashes must be valid hexadecimal strings

Ver fichero

@@ -5,37 +5,6 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [2.0.0] - 2025-12-03
### Changed
#### Major Backend Migration
- **Breaking Change**: Migrated from Elasticsearch to Redis for improved performance
- Replaced Elasticsearch Client with ioredis for Redis operations
- Redesigned data structure using Redis key patterns
- Implemented O(1) hash lookups using Redis indexes
- Significantly reduced search latency (< 10ms typical)
#### New Redis Architecture
- Document storage: `hash:plaintext:{plaintext}` keys
- Hash indexes: `hash:index:{algorithm}:{hash}` for fast lookups
- Statistics tracking: `hash:stats` Redis Hash
- Pipeline operations for atomic batch writes
- Connection pooling with automatic retry strategy
### Updated
#### Configuration
- Environment variables changed from `ELASTICSEARCH_NODE` to `REDIS_HOST`, `REDIS_PORT`, `REDIS_PASSWORD`, `REDIS_DB`
- Simplified connection setup with sensible defaults
- Optional Redis authentication support
#### Performance Improvements
- Search latency reduced to < 10ms (from ~50ms)
- Bulk indexing maintained at 1000-5000 docs/sec
- Lower memory footprint
- Better concurrent request handling (100+ users)
## [1.0.0] - 2025-12-03 ## [1.0.0] - 2025-12-03
### Added ### Added
@@ -48,12 +17,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Copy to clipboard functionality for all hash values - Copy to clipboard functionality for all hash values
#### Backend #### Backend
- Redis integration with ioredis - Elasticsearch integration with configurable endpoint
- Key-value storage with hash indexes - Custom index mapping with 10 shards for horizontal scaling
- Automatic key structure initialization - Automatic index creation on first use
- Auto-storage of searched plaintext for future lookups - Auto-indexing of searched plaintext for future lookups
- RESTful API endpoints for search and health checks - RESTful API endpoints for search and health checks
- Case-insensitive searches - Lowercase analyzer for case-insensitive searches
#### Frontend #### Frontend
- Modern, responsive UI with gradient design - Modern, responsive UI with gradient design
@@ -93,7 +62,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
#### Dependencies #### Dependencies
- Next.js 16.0.7 - Next.js 16.0.7
- React 19.2.0 - React 19.2.0
- ioredis 5.4.2 - Elasticsearch Client 8.x
- Lucide React (icons) - Lucide React (icons)
- Tailwind CSS 4.x - Tailwind CSS 4.x
- TypeScript 5.x - TypeScript 5.x
@@ -106,35 +75,28 @@ hasher/
│ ├── layout.tsx # Root layout │ ├── layout.tsx # Root layout
│ └── page.tsx # Main page │ └── page.tsx # Main page
├── lib/ # Utility libraries ├── lib/ # Utility libraries
│ ├── redis.ts # Redis client │ ├── elasticsearch.ts # ES client
│ └── hash.ts # Hash utilities │ └── hash.ts # Hash utilities
├── scripts/ # CLI scripts ├── scripts/ # CLI scripts
── index-file.ts # Bulk indexer ── index-file.ts # Bulk indexer
│ └── remove-duplicates.ts # Duplicate removal
└── docs/ # Documentation └── docs/ # Documentation
``` ```
#### Redis Data Structure #### Elasticsearch Index Schema
- Main documents: `hash:plaintext:{plaintext}` - Index name: `hasher`
- MD5 index: `hash:index:md5:{hash}` - Shards: 10
- SHA1 index: `hash:index:sha1:{hash}` - Replicas: 1
- SHA256 index: `hash:index:sha256:{hash}` - Fields: plaintext, md5, sha1, sha256, sha512, created_at
- SHA512 index: `hash:index:sha512:{hash}`
- Statistics: `hash:stats` (Redis Hash with count and size)
### Configuration ### Configuration
#### Environment Variables #### Environment Variables
- `REDIS_HOST`: Redis host (default: localhost) - `ELASTICSEARCH_NODE`: Elasticsearch endpoint (default: http://localhost:9200)
- `REDIS_PORT`: Redis port (default: 6379)
- `REDIS_PASSWORD`: Redis password (optional)
- `REDIS_DB`: Redis database number (default: 0)
#### Performance #### Performance
- Bulk indexing: 1000-5000 docs/sec - Bulk indexing: 1000-5000 docs/sec
- Search latency: < 10ms typical (O(1) lookups) - Search latency: < 50ms typical
- Horizontal scaling ready with Redis Cluster - Horizontal scaling ready
- Lower memory footprint than Elasticsearch
### Security ### Security
- Input validation on all endpoints - Input validation on all endpoints

Ver fichero

@@ -16,7 +16,7 @@ Thank you for considering contributing to Hasher! This document provides guideli
## 🎯 Areas for Contribution ## 🎯 Areas for Contribution
### Features ### Features
- Additional hash algorithms (argon2, etc.) - Additional hash algorithms (bcrypt validation, argon2, etc.)
- Export functionality (CSV, JSON) - Export functionality (CSV, JSON)
- Search history - Search history
- Batch hash lookup - Batch hash lookup
@@ -48,7 +48,7 @@ Thank you for considering contributing to Hasher! This document provides guideli
Before submitting a PR: Before submitting a PR:
1. Test the web interface thoroughly 1. Test the web interface thoroughly
2. Test the bulk indexing script 2. Test the bulk indexing script
3. Verify Redis integration 3. Verify Elasticsearch integration
4. Check for TypeScript errors: `npm run build` 4. Check for TypeScript errors: `npm run build`
5. Run linter: `npm run lint` 5. Run linter: `npm run lint`

Ver fichero

@@ -5,7 +5,7 @@ This guide covers deploying the Hasher application to production.
## Prerequisites ## Prerequisites
- Node.js 18.x or higher - Node.js 18.x or higher
- Redis 6.x or higher - Elasticsearch 8.x cluster
- Domain name (optional, for custom domain) - Domain name (optional, for custom domain)
- SSL certificate (recommended for production) - SSL certificate (recommended for production)
@@ -34,16 +34,12 @@ Vercel provides seamless deployment for Next.js applications.
4. **Set Environment Variables**: 4. **Set Environment Variables**:
- Go to your project settings on Vercel - Go to your project settings on Vercel
- Add environment variables: - Add environment variable: `ELASTICSEARCH_NODE=http://your-elasticsearch-host:9200`
- `REDIS_HOST=your-redis-host`
- `REDIS_PORT=6379`
- `REDIS_PASSWORD=your-password` (if using authentication)
- `REDIS_DB=0`
- Redeploy: `vercel --prod` - Redeploy: `vercel --prod`
#### Important Notes: #### Important Notes:
- Ensure Redis is accessible from Vercel's servers - Ensure Elasticsearch is accessible from Vercel's servers
- Consider using Redis Cloud (Upstash) or a publicly accessible Redis instance - Consider using Elastic Cloud or a publicly accessible Elasticsearch instance
- Use environment variables for sensitive configuration - Use environment variables for sensitive configuration
--- ---
@@ -120,8 +116,7 @@ docker build -t hasher:latest .
# Run the container # Run the container
docker run -d \ docker run -d \
-p 3000:3000 \ -p 3000:3000 \
-e REDIS_HOST=redis \ -e ELASTICSEARCH_NODE=http://elasticsearch:9200 \
-e REDIS_PORT=6379 \
--name hasher \ --name hasher \
hasher:latest hasher:latest
``` ```
@@ -139,23 +134,25 @@ services:
ports: ports:
- "3000:3000" - "3000:3000"
environment: environment:
- REDIS_HOST=redis - ELASTICSEARCH_NODE=http://elasticsearch:9200
- REDIS_PORT=6379
depends_on: depends_on:
- redis - elasticsearch
restart: unless-stopped restart: unless-stopped
redis: elasticsearch:
image: redis:7-alpine image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports: ports:
- "6379:6379" - "9200:9200"
volumes: volumes:
- redis-data:/data - elasticsearch-data:/usr/share/elasticsearch/data
restart: unless-stopped restart: unless-stopped
command: redis-server --appendonly yes
volumes: volumes:
redis-data: elasticsearch-data:
``` ```
Run with: Run with:
@@ -196,10 +193,7 @@ npm run build
```bash ```bash
cat > .env.local << EOF cat > .env.local << EOF
REDIS_HOST=localhost ELASTICSEARCH_NODE=http://localhost:9200
REDIS_PORT=6379
REDIS_PASSWORD=your-password
REDIS_DB=0
NODE_ENV=production NODE_ENV=production
EOF EOF
``` ```
@@ -239,43 +233,43 @@ sudo systemctl reload nginx
--- ---
## Redis Setup ## Elasticsearch Setup
### Option 1: Redis Cloud (Managed) ### Option 1: Elastic Cloud (Managed)
1. Sign up at [Redis Cloud](https://redis.com/try-free/) or [Upstash](https://upstash.com/) 1. Sign up at [Elastic Cloud](https://cloud.elastic.co/)
2. Create a database 2. Create a deployment
3. Note the connection details (host, port, password) 3. Note the endpoint URL
4. Update `REDIS_HOST`, `REDIS_PORT`, and `REDIS_PASSWORD` environment variables 4. Update `ELASTICSEARCH_NODE` environment variable
### Option 2: Self-Hosted ### Option 2: Self-Hosted
```bash ```bash
# Ubuntu/Debian # Ubuntu/Debian
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo sh -c 'echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" > /etc/apt/sources.list.d/elastic-8.x.list'
sudo apt-get update sudo apt-get update
sudo apt-get install redis-server sudo apt-get install elasticsearch
# Configure # Configure
sudo nano /etc/redis/redis.conf sudo nano /etc/elasticsearch/elasticsearch.yml
# Set: bind 0.0.0.0 (to allow remote connections) # Set: network.host: 0.0.0.0
# Set: requirepass your-strong-password (for security)
# Start # Start
sudo systemctl start redis-server sudo systemctl start elasticsearch
sudo systemctl enable redis-server sudo systemctl enable elasticsearch
``` ```
--- ---
## Security Considerations ## Security Considerations
### 1. Redis Security ### 1. Elasticsearch Security
- Enable authentication with requirepass - Enable authentication on Elasticsearch
- Use TLS for Redis connections (Redis 6+) - Use HTTPS for Elasticsearch connection
- Restrict network access with firewall rules - Restrict network access with firewall rules
- Update credentials regularly - Update credentials regularly
- Disable dangerous commands (FLUSHDB, FLUSHALL, etc.)
### 2. Application Security ### 2. Application Security
@@ -291,7 +285,7 @@ sudo systemctl enable redis-server
# Example UFW firewall rules # Example UFW firewall rules
sudo ufw allow 80/tcp sudo ufw allow 80/tcp
sudo ufw allow 443/tcp sudo ufw allow 443/tcp
sudo ufw allow from YOUR_IP to any port 6379 # Redis sudo ufw allow from YOUR_IP to any port 9200 # Elasticsearch
sudo ufw enable sudo ufw enable
``` ```
@@ -309,48 +303,37 @@ pm2 monit
pm2 logs hasher pm2 logs hasher
``` ```
### Redis Monitoring ### Elasticsearch Monitoring
```bash ```bash
# Health check # Health check
redis-cli ping curl http://localhost:9200/_cluster/health?pretty
# Get info # Index stats
redis-cli INFO curl http://localhost:9200/hasher/_stats?pretty
# Database stats
redis-cli INFO stats
# Memory usage
redis-cli INFO memory
``` ```
--- ---
## Backup and Recovery ## Backup and Recovery
### Redis Backups ### Elasticsearch Snapshots
```bash ```bash
# Enable AOF (Append Only File) persistence # Configure snapshot repository
redis-cli CONFIG SET appendonly yes curl -X PUT "localhost:9200/_snapshot/hasher_backup" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/mnt/backups/elasticsearch"
}
}'
# Save RDB snapshot manually # Create snapshot
redis-cli SAVE curl -X PUT "localhost:9200/_snapshot/hasher_backup/snapshot_1?wait_for_completion=true"
# Configure automatic backups in redis.conf # Restore snapshot
save 900 1 # Save if 1 key changed in 15 minutes curl -X POST "localhost:9200/_snapshot/hasher_backup/snapshot_1/_restore"
save 300 10 # Save if 10 keys changed in 5 minutes
save 60 10000 # Save if 10000 keys changed in 1 minute
# Backup files location (default)
# RDB: /var/lib/redis/dump.rdb
# AOF: /var/lib/redis/appendonly.aof
# Restore from backup
sudo systemctl stop redis-server
sudo cp /backup/dump.rdb /var/lib/redis/
sudo systemctl start redis-server
``` ```
--- ---
@@ -361,14 +344,13 @@ sudo systemctl start redis-server
1. Deploy multiple Next.js instances 1. Deploy multiple Next.js instances
2. Use a load balancer (nginx, HAProxy) 2. Use a load balancer (nginx, HAProxy)
3. Share the same Redis instance or cluster 3. Share the same Elasticsearch cluster
### Redis Scaling ### Elasticsearch Scaling
1. Use Redis Cluster for horizontal scaling 1. Add more nodes to the cluster
2. Set up Redis Sentinel for high availability 2. Increase shard count (already set to 10)
3. Use read replicas for read-heavy workloads 3. Use replicas for read scaling
4. Consider Redis Enterprise for advanced features
--- ---
@@ -381,31 +363,28 @@ pm2 status
pm2 logs hasher --lines 100 pm2 logs hasher --lines 100
``` ```
### Check Redis ### Check Elasticsearch
```bash ```bash
redis-cli ping curl http://localhost:9200/_cluster/health
redis-cli DBSIZE curl http://localhost:9200/hasher/_count
redis-cli INFO stats
``` ```
### Common Issues ### Common Issues
**Issue**: Cannot connect to Redis **Issue**: Cannot connect to Elasticsearch
- Check firewall rules - Check firewall rules
- Verify Redis is running: `redis-cli ping` - Verify Elasticsearch is running
- Check `REDIS_HOST`, `REDIS_PORT`, and `REDIS_PASSWORD` environment variables - Check `ELASTICSEARCH_NODE` environment variable
**Issue**: Out of memory **Issue**: Out of memory
- Increase Node.js memory: `NODE_OPTIONS=--max-old-space-size=4096` - Increase Node.js memory: `NODE_OPTIONS=--max-old-space-size=4096`
- Configure Redis maxmemory and eviction policy - Increase Elasticsearch heap size
- Use Redis persistence (RDB/AOF) carefully
**Issue**: Slow searches **Issue**: Slow searches
- Verify O(1) lookups are being used (direct key access) - Add more Elasticsearch nodes
- Check Redis memory and CPU usage - Optimize queries
- Consider using Redis Cluster for distribution - Increase replica count
- Optimize key patterns
--- ---
@@ -413,10 +392,9 @@ redis-cli INFO stats
1. **Enable Next.js Static Optimization** 1. **Enable Next.js Static Optimization**
2. **Use CDN for static assets** 2. **Use CDN for static assets**
3. **Enable Redis pipelining for bulk operations** 3. **Enable Elasticsearch caching**
4. **Configure appropriate maxmemory for Redis** 4. **Configure appropriate JVM heap for Elasticsearch**
5. **Use SSD storage for Redis persistence** 5. **Use SSD storage for Elasticsearch**
6. **Enable Redis connection pooling (already implemented)**
--- ---
@@ -424,6 +402,5 @@ redis-cli INFO stats
For deployment issues, check: For deployment issues, check:
- [Next.js Deployment Docs](https://nextjs.org/docs/deployment) - [Next.js Deployment Docs](https://nextjs.org/docs/deployment)
- [Redis Setup Guide](https://redis.io/docs/getting-started/) - [Elasticsearch Setup Guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html)
- [ioredis Documentation](https://github.com/redis/ioredis)
- Project GitHub Issues - Project GitHub Issues

Ver fichero

@@ -2,7 +2,7 @@
## 📋 Project Overview ## 📋 Project Overview
**Hasher** is a modern, high-performance hash search and generation tool built with Next.js and powered by Redis. It provides a beautiful web interface for searching hash values and generating cryptographic hashes from plaintext. **Hasher** is a modern, high-performance hash search and generation tool built with Next.js and powered by Elasticsearch. It provides a beautiful web interface for searching hash values and generating cryptographic hashes from plaintext.
### Version: 1.0.0 ### Version: 1.0.0
### Status: ✅ Production Ready ### Status: ✅ Production Ready
@@ -25,10 +25,10 @@
- Copy-to-clipboard functionality - Copy-to-clipboard functionality
### 📊 Backend ### 📊 Backend
- Redis integration with ioredis - Elasticsearch 8.x integration
- Key-value storage with hash indexes - 10-shard index for horizontal scaling
- RESTful API with JSON responses - RESTful API with JSON responses
- Automatic key structure initialization - Automatic index creation and initialization
- Health monitoring endpoint - Health monitoring endpoint
### 🎨 Frontend ### 🎨 Frontend
@@ -52,7 +52,7 @@
### Stack ### Stack
- **Frontend**: Next.js 16.0, React 19.2, Tailwind CSS 4.x - **Frontend**: Next.js 16.0, React 19.2, Tailwind CSS 4.x
- **Backend**: Next.js API Routes, Node.js 18+ - **Backend**: Next.js API Routes, Node.js 18+
- **Database**: Redis 6.x+ - **Database**: Elasticsearch 8.x
- **Language**: TypeScript 5.x - **Language**: TypeScript 5.x
- **Icons**: Lucide React - **Icons**: Lucide React
@@ -68,7 +68,7 @@ hasher/
│ └── globals.css # Global styles │ └── globals.css # Global styles
├── lib/ ├── lib/
│ ├── redis.ts # Redis client & config │ ├── elasticsearch.ts # ES client & config
│ └── hash.ts # Hash utilities │ └── hash.ts # Hash utilities
├── scripts/ ├── scripts/
@@ -106,7 +106,7 @@ Search for hashes or generate from plaintext
- **Output**: Hash results or generated hashes - **Output**: Hash results or generated hashes
### GET /api/health ### GET /api/health
Check system health and Redis status Check system health and Elasticsearch status
- **Output**: System status and statistics - **Output**: System status and statistics
--- ---
@@ -139,34 +139,28 @@ npm run index-file wordlist.txt -- --batch-size 500
### Environment Configuration ### Environment Configuration
```bash ```bash
# Optional: Set Redis connection details # Optional: Set Elasticsearch endpoint
export REDIS_HOST=localhost export ELASTICSEARCH_NODE=http://localhost:9200
export REDIS_PORT=6379
export REDIS_PASSWORD=your-password
export REDIS_DB=0
``` ```
--- ---
## 🗄️ Redis Data Structure ## 🗄️ Elasticsearch Configuration
### Key Patterns ### Index: `hasher`
- **Documents**: `hash:plaintext:{plaintext}` - Main document storage - **Shards**: 10 (horizontal scaling)
- **MD5 Index**: `hash:index:md5:{hash}` - MD5 hash lookup - **Replicas**: 1 (redundancy)
- **SHA1 Index**: `hash:index:sha1:{hash}` - SHA1 hash lookup - **Analyzer**: Custom lowercase analyzer
- **SHA256 Index**: `hash:index:sha256:{hash}` - SHA256 hash lookup
- **SHA512 Index**: `hash:index:sha512:{hash}` - SHA512 hash lookup
- **Statistics**: `hash:stats` - Redis Hash with count and size
### Document Schema ### Schema
```json ```json
{ {
"plaintext": "string", "plaintext": "text + keyword",
"md5": "string", "md5": "keyword",
"sha1": "string", "sha1": "keyword",
"sha256": "string", "sha256": "keyword",
"sha512": "string", "sha512": "keyword",
"created_at": "ISO 8601 date string" "created_at": "date"
} }
``` ```
@@ -186,9 +180,9 @@ export REDIS_DB=0
## 🚀 Performance Metrics ## 🚀 Performance Metrics
- **Bulk Indexing**: 1000-5000 docs/sec - **Bulk Indexing**: 1000-5000 docs/sec
- **Search Latency**: <10ms (typical O(1) lookups) - **Search Latency**: <50ms (typical)
- **Concurrent Users**: 100+ supported - **Concurrent Users**: 50+ supported
- **Horizontal Scaling**: Ready with Redis Cluster - **Horizontal Scaling**: Ready with 10 shards
--- ---
@@ -226,9 +220,9 @@ export REDIS_DB=0
### Requirements ### Requirements
- Node.js 18.x or higher - Node.js 18.x or higher
- Redis 6.x or higher - Elasticsearch 8.x
- 512MB RAM minimum - 512MB RAM minimum
- Redis server running locally or remotely - Internet connection for Elasticsearch
--- ---
@@ -291,7 +285,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
## 🙏 Acknowledgments ## 🙏 Acknowledgments
- Built with [Next.js](https://nextjs.org/) - Built with [Next.js](https://nextjs.org/)
- Powered by [Redis](https://redis.io/) - Powered by [Elasticsearch](https://www.elastic.co/)
- Icons by [Lucide](https://lucide.dev/) - Icons by [Lucide](https://lucide.dev/)
- Styled with [Tailwind CSS](https://tailwindcss.com/) - Styled with [Tailwind CSS](https://tailwindcss.com/)
@@ -319,7 +313,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
### Completed ✅ ### Completed ✅
- [x] Core hash search functionality - [x] Core hash search functionality
- [x] Hash generation from plaintext - [x] Hash generation from plaintext
- [x] Redis integration - [x] Elasticsearch integration
- [x] Modern responsive UI - [x] Modern responsive UI
- [x] Bulk indexing script - [x] Bulk indexing script
- [x] API endpoints - [x] API endpoints

Ver fichero

@@ -45,35 +45,32 @@ GET /api/health
- **Web Interface**: http://localhost:3000 - **Web Interface**: http://localhost:3000
- **Search API**: http://localhost:3000/api/search - **Search API**: http://localhost:3000/api/search
- **Health API**: http://localhost:3000/api/health - **Health API**: http://localhost:3000/api/health
- **Redis**: localhost:6379 - **Elasticsearch**: http://localhost:9200
## 📊 Redis Commands ## 📊 Elasticsearch Commands
```bash ```bash
# Test connection # Health
redis-cli ping curl http://localhost:9200/_cluster/health?pretty
# Get database stats # Index stats
redis-cli INFO stats curl http://localhost:9200/hasher/_stats?pretty
# Count all keys # Document count
redis-cli DBSIZE curl http://localhost:9200/hasher/_count?pretty
# List all hash documents # Search
redis-cli KEYS "hash:plaintext:*" curl http://localhost:9200/hasher/_search?pretty
# Get document # Delete index (CAUTION!)
redis-cli GET "hash:plaintext:password" curl -X DELETE http://localhost:9200/hasher
# Clear all data (CAUTION!)
redis-cli FLUSHDB
``` ```
## 🐛 Troubleshooting ## 🐛 Troubleshooting
| Problem | Solution | | Problem | Solution |
|---------|----------| |---------|----------|
| Can't connect to Redis | Check `REDIS_HOST` and `REDIS_PORT` env vars | | Can't connect to ES | Check `ELASTICSEARCH_NODE` env var |
| Port 3000 in use | Use `PORT=3001 npm run dev` | | Port 3000 in use | Use `PORT=3001 npm run dev` |
| Module not found | Run `npm install` | | Module not found | Run `npm install` |
| Build errors | Run `npm run build` to see details | | Build errors | Run `npm run build` to see details |
@@ -84,14 +81,17 @@ redis-cli FLUSHDB
|------|---------| |------|---------|
| `app/page.tsx` | Main UI component | | `app/page.tsx` | Main UI component |
| `app/api/search/route.ts` | Search endpoint | | `app/api/search/route.ts` | Search endpoint |
| `lib/redis.ts` | Redis configuration | | `lib/elasticsearch.ts` | ES configuration |
| `lib/hash.ts` | Hash utilities |
| `scripts/index-file.ts` | Bulk indexer |
## ⚙️ Environment Variables
```bash ```bash
# Required
ELASTICSEARCH_NODE=http://localhost:9200
# Optional # Optional
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your-password
REDIS_DB=0
NODE_ENV=production NODE_ENV=production
``` ```

115
README.md
Ver fichero

@@ -1,9 +1,9 @@
# Hasher 🔐 # Hasher 🔐
A modern, high-performance hash search and generation tool powered by Redis and Next.js. Search for hash values to find their plaintext origins or generate hashes from any text input. A modern, high-performance hash search and generation tool powered by Elasticsearch and Next.js. Search for hash values to find their plaintext origins or generate hashes from any text input.
![Hasher Banner](https://img.shields.io/badge/Next.js-16.0-black?style=for-the-badge&logo=next.js) ![Hasher Banner](https://img.shields.io/badge/Next.js-16.0-black?style=for-the-badge&logo=next.js)
![Redis](https://img.shields.io/badge/Redis-7.x-DC382D?style=for-the-badge&logo=redis) ![Elasticsearch](https://img.shields.io/badge/Elasticsearch-8.x-005571?style=for-the-badge&logo=elasticsearch)
![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178C6?style=for-the-badge&logo=typescript) ![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178C6?style=for-the-badge&logo=typescript)
## ✨ Features ## ✨ Features
@@ -11,11 +11,10 @@ A modern, high-performance hash search and generation tool powered by Redis and
- 🔍 **Hash Lookup**: Search for MD5, SHA1, SHA256, and SHA512 hashes - 🔍 **Hash Lookup**: Search for MD5, SHA1, SHA256, and SHA512 hashes
- 🔑 **Hash Generation**: Generate multiple hash types from plaintext - 🔑 **Hash Generation**: Generate multiple hash types from plaintext
- 💾 **Auto-Indexing**: Automatically stores searched plaintext and hashes - 💾 **Auto-Indexing**: Automatically stores searched plaintext and hashes
- 📊 **Redis Backend**: Ultra-fast in-memory storage with persistence - 📊 **Elasticsearch Backend**: Scalable storage with 10 shards for performance
- 🚀 **Bulk Indexing**: Import wordlists via command-line script with resume capability - 🚀 **Bulk Indexing**: Import wordlists via command-line script
- 🎨 **Modern UI**: Beautiful, responsive interface with real-time feedback - 🎨 **Modern UI**: Beautiful, responsive interface with real-time feedback
- 📋 **Copy to Clipboard**: One-click copying of any hash value - 📋 **Copy to Clipboard**: One-click copying of any hash value
-**High Performance**: Lightning-fast searches with Redis indexing
## 🏗️ Architecture ## 🏗️ Architecture
@@ -33,9 +32,8 @@ A modern, high-performance hash search and generation tool powered by Redis and
┌─────────────┐ ┌─────────────┐
Redis │ ← In-memory storage Elasticsearch│ ← Distributed storage
(Key-Value │ (localhost:6379) 10 Shards │ (localhost:9200)
│ + Hashes) │
└─────────────┘ └─────────────┘
``` ```
@@ -44,7 +42,7 @@ A modern, high-performance hash search and generation tool powered by Redis and
### Prerequisites ### Prerequisites
- Node.js 18.x or higher - Node.js 18.x or higher
- Redis 6.x or higher running on `localhost:6379` - Elasticsearch 8.x running on `localhost:9200`
- npm or yarn - npm or yarn
### Installation ### Installation
@@ -60,33 +58,20 @@ A modern, high-performance hash search and generation tool powered by Redis and
npm install npm install
``` ```
3. **Start Redis** (if not already running) 3. **Configure Elasticsearch** (optional)
By default, the app connects to `http://localhost:9200`. To change this:
```bash ```bash
# Using Docker export ELASTICSEARCH_NODE=http://your-elasticsearch-host:9200
docker run -d --name redis -p 6379:6379 redis:latest
# Or using system package manager
sudo systemctl start redis
``` ```
4. **Configure Redis** (optional) 4. **Run the development server**
By default, the app connects to `localhost:6379`. To change this:
```bash
export REDIS_HOST=your-redis-host
export REDIS_PORT=6379
export REDIS_PASSWORD=your-password # if authentication is enabled
export REDIS_DB=0 # database number
```
5. **Run the development server**
```bash ```bash
npm run dev npm run dev
``` ```
6. **Open your browser** 5. **Open your browser**
Navigate to [http://localhost:3000](http://localhost:3000) Navigate to [http://localhost:3000](http://localhost:3000)
@@ -115,12 +100,6 @@ npm run index-file wordlist.txt
# With custom batch size # With custom batch size
npm run index-file wordlist.txt -- --batch-size 500 npm run index-file wordlist.txt -- --batch-size 500
# Skip duplicate checking (faster)
npm run index-file wordlist.txt -- --no-check
# Resume interrupted indexing
npm run index-file wordlist.txt -- --resume
# Show help # Show help
npm run index-file -- --help npm run index-file -- --help
``` ```
@@ -135,11 +114,10 @@ qwerty
**Script features**: **Script features**:
- ✅ Bulk indexing with configurable batch size - ✅ Bulk indexing with configurable batch size
- ✅ Progress indicator and real-time stats - ✅ Progress indicator with percentage
- ✅ State persistence with resume capability
- ✅ Optional duplicate checking
- ✅ Error handling and reporting - ✅ Error handling and reporting
- ✅ Performance metrics (docs/sec) - ✅ Performance metrics (docs/sec)
- ✅ Automatic index refresh
## 🔌 API Reference ## 🔌 API Reference
@@ -193,17 +171,15 @@ Search for a hash or generate hashes from plaintext.
**GET** `/api/health` **GET** `/api/health`
Check Redis connection and index status. Check Elasticsearch connection and index status.
**Response**: **Response**:
```json ```json
{ {
"status": "ok", "status": "ok",
"redis": { "elasticsearch": {
"connected": true, "cluster": "elasticsearch",
"version": "7.0.15", "status": "green"
"usedMemory": 2097152,
"dbSize": 1542
}, },
"index": { "index": {
"exists": true, "exists": true,
@@ -216,33 +192,30 @@ Check Redis connection and index status.
} }
``` ```
## 🗄️ Redis Data Structure ## 🗄️ Elasticsearch Index
### Key Structure ### Index Configuration
**Main Documents**: `hash:plaintext:{plaintext}` - **Name**: `hasher`
- Stores complete hash document as JSON string - **Shards**: 10 (for horizontal scaling)
- Contains all hash algorithms and metadata - **Replicas**: 1 (for redundancy)
**Hash Indexes**: `hash:index:{algorithm}:{hash}` ### Mapping Schema
- Reverse lookup from hash to plaintext
- One key per algorithm (md5, sha1, sha256, sha512)
- Value is the plaintext string
**Statistics**: `hash:stats` (Redis Hash) ```json
- `count`: Total number of unique plaintexts
- `size`: Approximate total size in bytes
### Document Schema
```typescript
{ {
"plaintext": string, "plaintext": {
"md5": string, "type": "text",
"sha1": string, "analyzer": "lowercase_analyzer",
"sha256": string, "fields": {
"sha512": string, "keyword": { "type": "keyword" }
"created_at": string (ISO 8601) }
},
"md5": { "type": "keyword" },
"sha1": { "type": "keyword" },
"sha256": { "type": "keyword" },
"sha512": { "type": "keyword" },
"created_at": { "type": "date" }
} }
``` ```
@@ -260,11 +233,10 @@ hasher/
│ ├── page.tsx # Main UI component │ ├── page.tsx # Main UI component
│ └── globals.css # Global styles │ └── globals.css # Global styles
├── lib/ ├── lib/
│ ├── redis.ts # Redis client & data layer │ ├── elasticsearch.ts # ES client & index config
│ └── hash.ts # Hash utilities │ └── hash.ts # Hash utilities
├── scripts/ ├── scripts/
── index-file.ts # Bulk indexing script ── index-file.ts # Bulk indexing script
│ └── remove-duplicates.ts # Duplicate removal utility
├── package.json ├── package.json
├── tsconfig.json ├── tsconfig.json
├── next.config.ts ├── next.config.ts
@@ -285,10 +257,7 @@ npm run start
Create a `.env.local` file: Create a `.env.local` file:
```env ```env
REDIS_HOST=localhost ELASTICSEARCH_NODE=http://localhost:9200
REDIS_PORT=6379
REDIS_PASSWORD=your-password
REDIS_DB=0
``` ```
### Linting ### Linting
@@ -330,7 +299,7 @@ This project is open source and available under the [MIT License](LICENSE).
## 🙏 Acknowledgments ## 🙏 Acknowledgments
- Built with [Next.js](https://nextjs.org/) - Built with [Next.js](https://nextjs.org/)
- Powered by [Redis](https://redis.io/) - Powered by [Elasticsearch](https://www.elastic.co/)
- Icons by [Lucide](https://lucide.dev/) - Icons by [Lucide](https://lucide.dev/)
- Styled with [Tailwind CSS](https://tailwindcss.com/) - Styled with [Tailwind CSS](https://tailwindcss.com/)

Ver fichero

@@ -1,222 +0,0 @@
# Redis Migration - Quick Reference
## 🚀 Quick Start
### 1. Install Redis
```bash
# Ubuntu/Debian
sudo apt-get install redis-server
# macOS
brew install redis
# Start Redis
redis-server
# or
sudo systemctl start redis-server
```
### 2. Configure Environment (Optional)
```bash
# Create .env.local
cat > .env.local << EOF
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD= # Leave empty if no password
REDIS_DB=0
EOF
```
### 3. Start Application
```bash
yarn dev
```
## 🔍 Testing the Migration
### Test Health Endpoint
```bash
curl http://localhost:3000/api/health
```
Expected response:
```json
{
"status": "ok",
"redis": {
"version": "7.x",
"memory": "1.5M",
"dbSize": 0
},
"stats": {
"count": 0,
"size": 0
}
}
```
### Test Search API
```bash
# Generate hashes
curl -X POST http://localhost:3000/api/search \
-H "Content-Type: application/json" \
-d '{"query":"password"}'
# Search for hash
curl -X POST http://localhost:3000/api/search \
-H "Content-Type: application/json" \
-d '{"query":"5f4dcc3b5aa765d61d8327deb882cf99"}'
```
## 📊 Redis Commands
### Check Connection
```bash
redis-cli ping
# Should return: PONG
```
### View Data
```bash
# Count all keys
redis-cli DBSIZE
# List all documents
redis-cli KEYS "hash:plaintext:*"
# Get a specific document
redis-cli GET "hash:plaintext:password"
# Get statistics
redis-cli HGETALL hash:stats
# Search by hash
redis-cli GET "hash:index:md5:5f4dcc3b5aa765d61d8327deb882cf99"
```
### Clear Data (if needed)
```bash
# WARNING: Deletes ALL data in current database
redis-cli FLUSHDB
```
## 🔄 Bulk Indexing
### Basic Usage
```bash
yarn index-file sample-wordlist.txt
```
### Advanced Options
```bash
# Custom batch size
yarn index-file wordlist.txt -- --batch-size 500
# Skip duplicate checking (faster)
yarn index-file wordlist.txt -- --no-check
# Resume from previous state
yarn index-file wordlist.txt -- --resume
# Custom state file
yarn index-file wordlist.txt -- --state-file .my-state.json
```
## 🐛 Troubleshooting
### Cannot connect to Redis
```bash
# Check if Redis is running
redis-cli ping
# Check Redis status
sudo systemctl status redis-server
# View Redis logs
sudo journalctl -u redis-server -f
```
### Application shows Redis errors
1. Verify Redis is running: `redis-cli ping`
2. Check environment variables in `.env.local`
3. Check firewall rules if Redis is on another machine
4. Verify Redis password if authentication is enabled
### Clear stale state files
```bash
rm -f .indexer-state-*.json
```
## 📈 Monitoring
### Redis Memory Usage
```bash
redis-cli INFO memory
```
### Redis Stats
```bash
redis-cli INFO stats
```
### Application Stats
```bash
curl http://localhost:3000/api/health | jq .
```
## 🔒 Security (Production)
### Enable Redis Authentication
```bash
# Edit redis.conf
sudo nano /etc/redis/redis.conf
# Add/uncomment:
requirepass your-strong-password
# Restart Redis
sudo systemctl restart redis-server
```
### Update .env.local
```env
REDIS_PASSWORD=your-strong-password
```
## 📚 Key Differences from Elasticsearch
| Feature | Elasticsearch | Redis |
|---------|--------------|-------|
| Data Model | Document-based | Key-value |
| Search Complexity | O(log n) | O(1) |
| Setup | Complex cluster | Single instance |
| Memory | Higher | Lower |
| Latency | ~50ms | <10ms |
| Scaling | Shards/Replicas | Cluster/Sentinel |
## ✅ Verification Checklist
- [ ] Redis is installed and running
- [ ] Application builds without errors (`yarn build`)
- [ ] Health endpoint returns OK status
- [ ] Can generate hashes from plaintext
- [ ] Can search for generated hashes
- [ ] Statistics display on homepage
- [ ] Bulk indexing script works
- [ ] Data persists after application restart
## 📞 Support
- Redis Documentation: https://redis.io/docs/
- ioredis Documentation: https://github.com/redis/ioredis
- Project README: [README.md](README.md)
---
**Quick Test Command:**
```bash
# One-liner to test everything
redis-cli ping && yarn build && curl -s http://localhost:3000/api/health | jq .status
```
If all commands succeed, the migration is working correctly! ✅

Ver fichero

@@ -9,7 +9,7 @@ This guide will help you quickly set up and test the Hasher application.
Ensure you have: Ensure you have:
- ✅ Node.js 18.x or higher (`node --version`) - ✅ Node.js 18.x or higher (`node --version`)
- ✅ npm (`npm --version`) - ✅ npm (`npm --version`)
-Redis running on `localhost:6379` -Elasticsearch running on `localhost:9200`
### 2. Installation ### 2. Installation
@@ -26,7 +26,7 @@ npm run dev
The application will be available at: **http://localhost:3000** The application will be available at: **http://localhost:3000**
### 3. Verify Redis Connection ### 3. Verify Elasticsearch Connection
```bash ```bash
# Check health endpoint # Check health endpoint
@@ -37,15 +37,7 @@ Expected response:
```json ```json
{ {
"status": "ok", "status": "ok",
"redis": { "elasticsearch": { ... }
"version": "7.x",
"memory": "1.5M",
"dbSize": 0
},
"stats": {
"count": 0,
"size": 0
}
} }
``` ```
@@ -94,18 +86,20 @@ npm run index-file sample-wordlist.txt
``` ```
📚 Hasher Indexer 📚 Hasher Indexer
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Redis: localhost:6379 Elasticsearch: http://localhost:9200
Index: hasher
File: sample-wordlist.txt File: sample-wordlist.txt
Batch size: 100 Batch size: 100
Duplicate check: enabled
🔗 Connecting to Redis... 🔗 Connecting to Elasticsearch...
✅ Connected successfully ✅ Connected successfully
📖 Reading file... 📖 Reading file...
✅ Found 20 words/phrases to process ✅ Found 20 words/phrases to process
⏳ Progress: 20/20 (100.0%) - Indexed: 20, Skipped: 0, Errors: 0 ⏳ Progress: 20/20 (100.0%) - Indexed: 20, Errors: 0
🔄 Refreshing index...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Indexing complete! ✅ Indexing complete!
@@ -191,13 +185,13 @@ fetch('/api/search', {
- [ ] Results display correctly - [ ] Results display correctly
### Data Persistence ### Data Persistence
- [ ] New plaintext is saved to Redis - [ ] New plaintext is saved to Elasticsearch
- [ ] Saved hashes can be found in subsequent searches - [ ] Saved hashes can be found in subsequent searches
- [ ] Bulk indexing saves all entries - [ ] Bulk indexing saves all entries
- [ ] Redis keys are created with proper patterns - [ ] Index is created automatically if missing
### Error Handling ### Error Handling
- [ ] Redis connection errors are handled - [ ] Elasticsearch connection errors are handled
- [ ] Empty search queries are prevented - [ ] Empty search queries are prevented
- [ ] Invalid input is handled gracefully - [ ] Invalid input is handled gracefully
- [ ] Network errors show user-friendly messages - [ ] Network errors show user-friendly messages
@@ -206,16 +200,15 @@ fetch('/api/search', {
## 🐛 Common Issues & Solutions ## 🐛 Common Issues & Solutions
### Issue: Cannot connect to Redis ### Issue: Cannot connect to Elasticsearch
**Solution**: **Solution**:
```bash ```bash
# Check if Redis is running # Check if Elasticsearch is running
redis-cli ping curl http://localhost:9200
# If not accessible, update the environment variables # If not accessible, update the environment variable
export REDIS_HOST=localhost export ELASTICSEARCH_NODE=http://your-elasticsearch-host:9200
export REDIS_PORT=6379
npm run dev npm run dev
``` ```
@@ -249,34 +242,33 @@ npm run index-file -- "$(pwd)/sample-wordlist.txt"
--- ---
## 📊 Verify Data in Redis ## 📊 Verify Data in Elasticsearch
### Check Database Size ### Check Index Stats
```bash ```bash
redis-cli DBSIZE curl http://localhost:9200/hasher/_stats?pretty
``` ```
### Get Statistics ### Count Documents
```bash ```bash
redis-cli HGETALL hash:stats curl http://localhost:9200/hasher/_count?pretty
``` ```
### View Sample Documents ### View Sample Documents
```bash ```bash
# List first 10 document keys curl http://localhost:9200/hasher/_search?pretty&size=5
redis-cli --scan --pattern "hash:plaintext:*" | head -10
# Get a specific document
redis-cli GET "hash:plaintext:password"
``` ```
### Search Specific Hash ### Search Specific Hash
```bash ```bash
# Find document by MD5 hash curl http://localhost:9200/hasher/_search?pretty -H 'Content-Type: application/json' -d'
redis-cli GET "hash:index:md5:5f4dcc3b5aa765d61d8327deb882cf99" {
"query": {
# Then get the full document "term": {
redis-cli GET "hash:plaintext:password" "md5": "5f4dcc3b5aa765d61d8327deb882cf99"
}
}
}'
``` ```
--- ---
@@ -337,7 +329,7 @@ Create `search.json`:
- [ ] CORS configuration - [ ] CORS configuration
- [ ] Rate limiting (if implemented) - [ ] Rate limiting (if implemented)
- [ ] Error message information disclosure - [ ] Error message information disclosure
- [ ] Redis authentication (if enabled) - [ ] Elasticsearch authentication (if enabled)
--- ---
@@ -347,7 +339,7 @@ Before deploying to production:
- [ ] All tests passing - [ ] All tests passing
- [ ] Environment variables configured - [ ] Environment variables configured
- [ ] Redis secured and backed up (RDB/AOF) - [ ] Elasticsearch secured and backed up
- [ ] SSL/TLS certificates installed - [ ] SSL/TLS certificates installed
- [ ] Error logging configured - [ ] Error logging configured
- [ ] Monitoring set up - [ ] Monitoring set up
@@ -365,7 +357,7 @@ Before deploying to production:
## Environment ## Environment
- Node.js version: - Node.js version:
- Redis version: - Elasticsearch version:
- Browser(s) tested: - Browser(s) tested:
## Test Results ## Test Results

Ver fichero

@@ -1,29 +1,34 @@
import { NextResponse } from 'next/server'; import { NextResponse } from 'next/server';
import { getRedisInfo, getStats, INDEX_NAME } from '@/lib/redis'; import { esClient, INDEX_NAME } from '@/lib/elasticsearch';
export async function GET() { export async function GET() {
try { try {
// Check Redis connection and get info // Check Elasticsearch connection
const redisInfo = await getRedisInfo(); const health = await esClient.cluster.health({});
// Get index stats // Check if index exists
const stats = await getStats(); const indexExists = await esClient.indices.exists({ index: INDEX_NAME });
// Get index stats if exists
let stats = null;
if (indexExists) {
const statsResponse = await esClient.indices.stats({ index: INDEX_NAME });
stats = {
documentCount: statsResponse._all?.primaries?.docs?.count || 0,
indexSize: statsResponse._all?.primaries?.store?.size_in_bytes || 0
};
}
return NextResponse.json({ return NextResponse.json({
status: 'ok', status: 'ok',
redis: { elasticsearch: {
connected: redisInfo.connected, cluster: health.cluster_name,
version: redisInfo.version, status: health.status,
usedMemory: redisInfo.usedMemory,
dbSize: redisInfo.dbSize
}, },
index: { index: {
exists: true, exists: indexExists,
name: INDEX_NAME, name: INDEX_NAME,
stats: { stats
documentCount: stats.count,
indexSize: stats.size
}
} }
}); });
} catch (error) { } catch (error) {

Ver fichero

@@ -1,52 +1,152 @@
import { NextRequest, NextResponse } from 'next/server'; import { NextRequest, NextResponse } from 'next/server';
import { storeHashDocument, findByPlaintext, findByHash, initializeRedis } from '@/lib/redis'; import { esClient, INDEX_NAME, initializeIndex } from '@/lib/elasticsearch';
import { generateHashes, detectHashType } from '@/lib/hash'; import { generateHashes, detectHashType } from '@/lib/hash';
interface HashDocument {
plaintext: string;
md5: string;
sha1: string;
sha256: string;
sha512: string;
created_at?: string;
}
// Maximum allowed query length
const MAX_QUERY_LENGTH = 1000;
// Characters that could be used in NoSQL/Elasticsearch injection attacks
const DANGEROUS_PATTERNS = [
/[{}\[\]]/g, // JSON structure characters
/\$[a-zA-Z]/g, // MongoDB-style operators
/\\u[0-9a-fA-F]{4}/g, // Unicode escapes
/<script/gi, // XSS attempts
/javascript:/gi, // XSS attempts
];
/**
* Sanitize input to prevent NoSQL injection attacks
* For hash lookups, we only need alphanumeric characters and $
* For plaintext, we allow more characters but sanitize dangerous patterns
*/
function sanitizeInput(input: string): string {
// Trim and take first word only
let sanitized = input.trim().split(/\s+/)[0] || '';
// Limit length
if (sanitized.length > MAX_QUERY_LENGTH) {
sanitized = sanitized.substring(0, MAX_QUERY_LENGTH);
}
// Remove null bytes
sanitized = sanitized.replace(/\0/g, '');
// Check for dangerous patterns
for (const pattern of DANGEROUS_PATTERNS) {
sanitized = sanitized.replace(pattern, '');
}
return sanitized;
}
/**
* Validate that the input is safe for use in Elasticsearch queries
*/
function isValidInput(input: string): boolean {
// Check for empty input
if (!input || input.length === 0) {
return false;
}
// Check for excessively long input
if (input.length > MAX_QUERY_LENGTH) {
return false;
}
// Check for control characters (except normal whitespace)
if (/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/.test(input)) {
return false;
}
return true;
}
export async function POST(request: NextRequest) { export async function POST(request: NextRequest) {
try { try {
const { query } = await request.json(); const body = await request.json();
if (!query || typeof query !== 'string') { // Validate request body structure
if (!body || typeof body !== 'object') {
return NextResponse.json( return NextResponse.json(
{ error: 'Query parameter is required' }, { error: 'Invalid request body' },
{ status: 400 } { status: 400 }
); );
} }
// Ensure Redis is connected const { query } = body;
await initializeRedis();
const cleanQuery = query.trim().split(/\s+/)[0]; // Validate query type
if (!query || typeof query !== 'string') {
return NextResponse.json(
{ error: 'Query parameter is required and must be a string' },
{ status: 400 }
);
}
// Validate input before processing
if (!isValidInput(query)) {
return NextResponse.json(
{ error: 'Invalid query: contains forbidden characters or is too long' },
{ status: 400 }
);
}
// Sanitize input
const cleanQuery = sanitizeInput(query);
if (!cleanQuery) { if (!cleanQuery) {
return NextResponse.json( return NextResponse.json(
{ error: 'Invalid query: only whitespace provided' }, { error: 'Invalid query: only whitespace or invalid characters provided' },
{ status: 400 } { status: 400 }
); );
} }
// Ensure index exists
await initializeIndex();
const cleanQueryLower = cleanQuery.toLowerCase(); const cleanQueryLower = cleanQuery.toLowerCase();
const hashType = detectHashType(cleanQueryLower); const hashType = detectHashType(cleanQueryLower);
if (hashType) { if (hashType) {
// Query is a hash - search for it in Redis // Query is a hash - search for it in Elasticsearch
const doc = await findByHash(hashType, cleanQueryLower); const searchResponse = await esClient.search<HashDocument>({
index: INDEX_NAME,
query: {
term: {
[hashType]: cleanQueryLower
}
}
});
if (doc) { const hits = searchResponse.hits.hits;
if (hits.length > 0) {
// Found matching plaintext // Found matching plaintext
return NextResponse.json({ return NextResponse.json({
found: true, found: true,
hashType, hashType,
hash: cleanQuery, hash: cleanQuery,
results: [{ results: hits.map((hit) => {
plaintext: doc.plaintext, const source = hit._source!;
hashes: { return {
md5: doc.md5, plaintext: source.plaintext,
sha1: doc.sha1, hashes: {
sha256: doc.sha256, md5: source.md5,
sha512: doc.sha512, sha1: source.sha1,
} sha256: source.sha256,
}] sha512: source.sha512,
}
};
})
}); });
} else { } else {
// Hash not found in database // Hash not found in database
@@ -59,13 +159,20 @@ export async function POST(request: NextRequest) {
} }
} else { } else {
// Query is plaintext - check if it already exists first // Query is plaintext - check if it already exists first
const existingDoc = await findByPlaintext(cleanQuery); const existsResponse = await esClient.search<HashDocument>({
index: INDEX_NAME,
query: {
term: {
'plaintext.keyword': cleanQuery
}
}
});
let hashes; let hashes;
let wasGenerated = false;
if (existingDoc) { if (existsResponse.hits.hits.length > 0) {
// Plaintext found, retrieve existing hashes // Plaintext found, retrieve existing hashes
const existingDoc = existsResponse.hits.hits[0]._source!;
hashes = { hashes = {
md5: existingDoc.md5, md5: existingDoc.md5,
sha1: existingDoc.sha1, sha1: existingDoc.sha1,
@@ -73,22 +180,44 @@ export async function POST(request: NextRequest) {
sha512: existingDoc.sha512, sha512: existingDoc.sha512,
}; };
} else { } else {
// Plaintext not found, generate and store hashes // Plaintext not found, generate hashes and check if any hash already exists
hashes = await generateHashes(cleanQuery); hashes = generateHashes(cleanQuery);
await storeHashDocument({ const hashExistsResponse = await esClient.search<HashDocument>({
...hashes, index: INDEX_NAME,
created_at: new Date().toISOString() query: {
bool: {
should: [
{ term: { md5: hashes.md5 } },
{ term: { sha1: hashes.sha1 } },
{ term: { sha256: hashes.sha256 } },
{ term: { sha512: hashes.sha512 } },
],
minimum_should_match: 1
}
}
}); });
wasGenerated = true; if (hashExistsResponse.hits.hits.length === 0) {
// No duplicates found, insert new document
await esClient.index({
index: INDEX_NAME,
document: {
...hashes,
created_at: new Date().toISOString()
}
});
// Refresh index to make the document searchable immediately
await esClient.indices.refresh({ index: INDEX_NAME });
}
} }
return NextResponse.json({ return NextResponse.json({
found: true, found: true,
isPlaintext: true, isPlaintext: true,
plaintext: cleanQuery, plaintext: cleanQuery,
wasGenerated, wasGenerated: existsResponse.hits.hits.length === 0,
hashes: { hashes: {
md5: hashes.md5, md5: hashes.md5,
sha1: hashes.sha1, sha1: hashes.sha1,

Ver fichero

@@ -14,8 +14,8 @@ const geistMono = Geist_Mono({
export const metadata: Metadata = { export const metadata: Metadata = {
title: "Hasher - Hash Search & Generator", title: "Hasher - Hash Search & Generator",
description: "Search for hashes or generate them from plaintext. Supports MD5, SHA1, SHA256, and SHA512. Powered by Redis.", description: "Search for hashes or generate them from plaintext. Supports MD5, SHA1, SHA256, and SHA512. Powered by Elasticsearch.",
keywords: ["hash", "md5", "sha1", "sha256", "sha512", "hash generator", "hash search", "redis"], keywords: ["hash", "md5", "sha1", "sha256", "sha512", "hash generator", "hash search", "elasticsearch"],
authors: [{ name: "Hasher" }], authors: [{ name: "Hasher" }],
creator: "Hasher", creator: "Hasher",
publisher: "Hasher", publisher: "Hasher",

Ver fichero

@@ -1,7 +1,8 @@
'use client'; 'use client';
import { useState, useEffect } from 'react'; import { useState, useEffect, useCallback, Suspense } from 'react';
import { Search, Copy, Check, Hash, Key, AlertCircle, Loader2, Database } from 'lucide-react'; import { useSearchParams } from 'next/navigation';
import { Search, Copy, Check, Hash, Key, AlertCircle, Loader2, Database, Link } from 'lucide-react';
interface SearchResult { interface SearchResult {
found: boolean; found: boolean;
@@ -45,13 +46,62 @@ function formatNumber(num: number): string {
return num.toLocaleString(); return num.toLocaleString();
} }
export default function Home() { function HasherContent() {
const searchParams = useSearchParams();
const [query, setQuery] = useState(''); const [query, setQuery] = useState('');
const [result, setResult] = useState<SearchResult | null>(null); const [result, setResult] = useState<SearchResult | null>(null);
const [loading, setLoading] = useState(false); const [loading, setLoading] = useState(false);
const [error, setError] = useState(''); const [error, setError] = useState('');
const [copiedField, setCopiedField] = useState<string | null>(null); const [copiedField, setCopiedField] = useState<string | null>(null);
const [stats, setStats] = useState<IndexStats | null>(null); const [stats, setStats] = useState<IndexStats | null>(null);
const [copiedLink, setCopiedLink] = useState(false);
const [initialLoadDone, setInitialLoadDone] = useState(false);
const performSearch = useCallback(async (searchQuery: string, updateUrl: boolean = true) => {
if (!searchQuery.trim()) return;
setLoading(true);
setError('');
setResult(null);
try {
const response = await fetch('/api/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query: searchQuery.trim() })
});
if (!response.ok) {
throw new Error('Search failed');
}
const data = await response.json();
setResult(data);
// Update URL with search query (using history API to avoid re-triggering effects)
if (updateUrl) {
const newUrl = new URL(window.location.href);
newUrl.searchParams.set('q', searchQuery.trim());
window.history.replaceState(null, '', newUrl.pathname + newUrl.search);
}
} catch (_err) {
setError('Failed to perform search. Please check your connection.');
} finally {
setLoading(false);
}
}, []);
// Load query from URL on mount (only once)
useEffect(() => {
if (initialLoadDone) return;
const urlQuery = searchParams.get('q');
if (urlQuery) {
setQuery(urlQuery);
performSearch(urlQuery, false);
}
setInitialLoadDone(true);
}, [searchParams, performSearch, initialLoadDone]);
useEffect(() => { useEffect(() => {
const fetchStats = async () => { const fetchStats = async () => {
@@ -73,30 +123,7 @@ export default function Home() {
const handleSearch = async (e: React.FormEvent) => { const handleSearch = async (e: React.FormEvent) => {
e.preventDefault(); e.preventDefault();
if (!query.trim()) return; performSearch(query);
setLoading(true);
setError('');
setResult(null);
try {
const response = await fetch('/api/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query: query.trim() })
});
if (!response.ok) {
throw new Error('Search failed');
}
const data = await response.json();
setResult(data);
} catch (_err) {
setError('Failed to perform search. Please check your connection.');
} finally {
setLoading(false);
}
}; };
const copyToClipboard = (text: string, field: string) => { const copyToClipboard = (text: string, field: string) => {
@@ -105,6 +132,14 @@ export default function Home() {
setTimeout(() => setCopiedField(null), 2000); setTimeout(() => setCopiedField(null), 2000);
}; };
const copyShareLink = () => {
const url = new URL(window.location.href);
url.searchParams.set('q', query.trim());
navigator.clipboard.writeText(url.toString());
setCopiedLink(true);
setTimeout(() => setCopiedLink(false), 2000);
};
const HashDisplay = ({ label, value, field }: { label: string; value: string; field: string }) => ( const HashDisplay = ({ label, value, field }: { label: string; value: string; field: string }) => (
<div className="bg-gray-50 rounded-lg p-4 border border-gray-200"> <div className="bg-gray-50 rounded-lg p-4 border border-gray-200">
<div className="flex items-center justify-between mb-2"> <div className="flex items-center justify-between mb-2">
@@ -166,19 +201,35 @@ export default function Home() {
value={query} value={query}
onChange={(e) => setQuery(e.target.value)} onChange={(e) => setQuery(e.target.value)}
placeholder="Enter a hash or plaintext..." placeholder="Enter a hash or plaintext..."
className="w-full px-6 py-4 pr-14 text-lg rounded-2xl border-2 border-gray-200 focus:border-blue-500 focus:ring-4 focus:ring-blue-100 outline-none transition-all shadow-sm" className="w-full px-6 py-4 pr-28 text-lg rounded-2xl border-2 border-gray-200 focus:border-blue-500 focus:ring-4 focus:ring-blue-100 outline-none transition-all shadow-sm"
/> />
<button <div className="absolute right-2 top-1/2 -translate-y-1/2 flex gap-1">
type="submit" {query.trim() && (
disabled={loading || !query.trim()} <button
className="absolute right-2 top-1/2 -translate-y-1/2 bg-gradient-to-r from-blue-600 to-purple-600 text-white p-3 rounded-xl hover:shadow-lg disabled:opacity-50 disabled:cursor-not-allowed transition-all" type="button"
> onClick={copyShareLink}
{loading ? ( className="bg-gray-100 text-gray-600 p-3 rounded-xl hover:bg-gray-200 transition-all"
<Loader2 className="w-6 h-6 animate-spin" /> title="Copy share link"
) : ( >
<Search className="w-6 h-6" /> {copiedLink ? (
<Check className="w-6 h-6 text-green-600" />
) : (
<Link className="w-6 h-6" />
)}
</button>
)} )}
</button> <button
type="submit"
disabled={loading || !query.trim()}
className="bg-gradient-to-r from-blue-600 to-purple-600 text-white p-3 rounded-xl hover:shadow-lg disabled:opacity-50 disabled:cursor-not-allowed transition-all"
>
{loading ? (
<Loader2 className="w-6 h-6 animate-spin" />
) : (
<Search className="w-6 h-6" />
)}
</button>
</div>
</div> </div>
</form> </form>
@@ -308,10 +359,26 @@ export default function Home() {
{/* Footer */} {/* Footer */}
<footer className="mt-16 text-center text-gray-500 text-sm"> <footer className="mt-16 text-center text-gray-500 text-sm">
<p>Powered by Redis Built with Next.js</p> <p>Powered by Elasticsearch Built with Next.js</p>
</footer> </footer>
</div> </div>
</div> </div>
); );
} }
function LoadingFallback() {
return (
<div className="min-h-screen bg-gradient-to-br from-blue-50 via-white to-purple-50 flex items-center justify-center">
<Loader2 className="w-12 h-12 text-blue-600 animate-spin" />
</div>
);
}
export default function Home() {
return (
<Suspense fallback={<LoadingFallback />}>
<HasherContent />
</Suspense>
);
}

76
lib/elasticsearch.ts Archivo normal
Ver fichero

@@ -0,0 +1,76 @@
import { Client } from '@elastic/elasticsearch';
const ELASTICSEARCH_NODE = process.env.ELASTICSEARCH_NODE || 'http://localhost:9200';
const INDEX_NAME = 'hasher';
export const esClient = new Client({
node: ELASTICSEARCH_NODE,
requestTimeout: 30000,
maxRetries: 3,
});
export const INDEX_MAPPING = {
settings: {
number_of_shards: 10,
number_of_replicas: 1,
analysis: {
analyzer: {
lowercase_analyzer: {
type: 'custom' as const,
tokenizer: 'keyword',
filter: ['lowercase']
}
}
}
},
mappings: {
properties: {
plaintext: {
type: 'text' as const,
analyzer: 'lowercase_analyzer',
fields: {
keyword: {
type: 'keyword' as const
}
}
},
md5: {
type: 'keyword' as const
},
sha1: {
type: 'keyword' as const
},
sha256: {
type: 'keyword' as const
},
sha512: {
type: 'keyword' as const
},
created_at: {
type: 'date' as const
}
}
}
};
export async function initializeIndex(): Promise<void> {
try {
const indexExists = await esClient.indices.exists({ index: INDEX_NAME });
if (!indexExists) {
await esClient.indices.create({
index: INDEX_NAME,
settings: INDEX_MAPPING.settings,
mappings: INDEX_MAPPING.mappings
});
console.log(`Index '${INDEX_NAME}' created successfully with 10 shards`);
} else {
console.log(`Index '${INDEX_NAME}' already exists`);
}
} catch (error) {
console.error('Error initializing Elasticsearch index:', error);
throw error;
}
}
export { INDEX_NAME };

Ver fichero

@@ -11,7 +11,7 @@ export interface HashResult {
/** /**
* Generate all common hashes for a given plaintext * Generate all common hashes for a given plaintext
*/ */
export async function generateHashes(plaintext: string): Promise<HashResult> { export function generateHashes(plaintext: string): HashResult {
return { return {
plaintext, plaintext,
md5: crypto.createHash('md5').update(plaintext).digest('hex'), md5: crypto.createHash('md5').update(plaintext).digest('hex'),

Ver fichero

@@ -1,178 +0,0 @@
import Redis from 'ioredis';
const REDIS_HOST = process.env.REDIS_HOST || 'localhost';
const REDIS_PORT = parseInt(process.env.REDIS_PORT || '6379', 10);
const REDIS_PASSWORD = process.env.REDIS_PASSWORD || undefined;
const REDIS_DB = parseInt(process.env.REDIS_DB || '0', 10);
export const INDEX_NAME = 'hasher';
// Create Redis client with connection pooling
export const redisClient = new Redis({
host: REDIS_HOST,
port: REDIS_PORT,
password: REDIS_PASSWORD,
db: REDIS_DB,
retryStrategy: (times) => {
const delay = Math.min(times * 50, 2000);
return delay;
},
maxRetriesPerRequest: 3,
enableReadyCheck: true,
lazyConnect: false,
});
// Handle connection errors
redisClient.on('error', (err) => {
console.error('Redis Client Error:', err);
});
redisClient.on('connect', () => {
console.log('Redis connected successfully');
});
/**
* Redis Keys Structure:
*
* 1. Hash documents: hash:plaintext:{plaintext} = JSON string
* - Stores all hash data for a plaintext
*
* 2. Hash indexes: hash:index:{algorithm}:{hash} = plaintext
* - Allows reverse lookup from hash to plaintext
* - One key per algorithm (md5, sha1, sha256, sha512)
*
* 3. Statistics: hash:stats = Hash {count, size}
* - count: total number of unique plaintexts
* - size: approximate total size in bytes
*/
export interface HashDocument {
plaintext: string;
md5: string;
sha1: string;
sha256: string;
sha512: string;
created_at: string;
}
/**
* Store a hash document in Redis
*/
export async function storeHashDocument(doc: HashDocument): Promise<void> {
const pipeline = redisClient.pipeline();
// Store main document
const key = `hash:plaintext:${doc.plaintext}`;
pipeline.set(key, JSON.stringify(doc));
// Create indexes for each hash type
pipeline.set(`hash:index:md5:${doc.md5}`, doc.plaintext);
pipeline.set(`hash:index:sha1:${doc.sha1}`, doc.plaintext);
pipeline.set(`hash:index:sha256:${doc.sha256}`, doc.plaintext);
pipeline.set(`hash:index:sha512:${doc.sha512}`, doc.plaintext);
// Update statistics
pipeline.hincrby('hash:stats', 'count', 1);
pipeline.hincrby('hash:stats', 'size', JSON.stringify(doc).length);
await pipeline.exec();
}
/**
* Find a hash document by plaintext
*/
export async function findByPlaintext(plaintext: string): Promise<HashDocument | null> {
const key = `hash:plaintext:${plaintext}`;
const data = await redisClient.get(key);
if (!data) return null;
return JSON.parse(data) as HashDocument;
}
/**
* Find a hash document by any hash value
*/
export async function findByHash(algorithm: string, hash: string): Promise<HashDocument | null> {
const indexKey = `hash:index:${algorithm}:${hash}`;
const plaintext = await redisClient.get(indexKey);
if (!plaintext) return null;
return findByPlaintext(plaintext);
}
/**
* Check if plaintext or any of its hashes exist
*/
export async function checkExistence(plaintext: string, hashes: {
md5: string;
sha1: string;
sha256: string;
sha512: string;
}): Promise<boolean> {
const pipeline = redisClient.pipeline();
pipeline.exists(`hash:plaintext:${plaintext}`);
pipeline.exists(`hash:index:md5:${hashes.md5}`);
pipeline.exists(`hash:index:sha1:${hashes.sha1}`);
pipeline.exists(`hash:index:sha256:${hashes.sha256}`);
pipeline.exists(`hash:index:sha512:${hashes.sha512}`);
const results = await pipeline.exec();
if (!results) return false;
// Check if any key exists
return results.some(([err, value]) => !err && value === 1);
}
/**
* Get index statistics
*/
export async function getStats(): Promise<{ count: number; size: number }> {
const stats = await redisClient.hgetall('hash:stats');
return {
count: parseInt(stats.count || '0', 10),
size: parseInt(stats.size || '0', 10)
};
}
/**
* Initialize Redis (compatibility function, Redis doesn't need explicit initialization)
*/
export async function initializeRedis(): Promise<void> {
// Check connection
await redisClient.ping();
console.log('Redis initialized successfully');
}
/**
* Get Redis info for health check
*/
export async function getRedisInfo(): Promise<{
connected: boolean;
version: string;
usedMemory: number;
dbSize: number;
}> {
const info = await redisClient.info('server');
const memory = await redisClient.info('memory');
const dbSize = await redisClient.dbsize();
// Parse Redis info string
const parseInfo = (infoStr: string, key: string): string => {
const match = infoStr.match(new RegExp(`${key}:(.+)`));
return match ? match[1].trim() : 'unknown';
};
return {
connected: redisClient.status === 'ready',
version: parseInfo(info, 'redis_version'),
usedMemory: parseInt(parseInfo(memory, 'used_memory'), 10) || 0,
dbSize
};
}
export { REDIS_HOST, REDIS_PORT };

Ver fichero

@@ -1,14 +1,14 @@
{ {
"name": "hasher", "name": "hasher",
"version": "1.0.0", "version": "1.0.0",
"description": "A modern hash search and generation tool powered by Redis and Next.js", "description": "A modern hash search and generation tool powered by Elasticsearch and Next.js",
"keywords": [ "keywords": [
"hash", "hash",
"md5", "md5",
"sha1", "sha1",
"sha256", "sha256",
"sha512", "sha512",
"redis", "elasticsearch",
"nextjs", "nextjs",
"cryptography", "cryptography",
"security", "security",
@@ -38,7 +38,7 @@
"remove-duplicates": "tsx scripts/remove-duplicates.ts" "remove-duplicates": "tsx scripts/remove-duplicates.ts"
}, },
"dependencies": { "dependencies": {
"ioredis": "^5.4.2", "@elastic/elasticsearch": "^9.2.0",
"lucide-react": "^0.555.0", "lucide-react": "^0.555.0",
"next": "15.4.8", "next": "15.4.8",
"react": "19.1.2", "react": "19.1.2",

Ver fichero

@@ -4,7 +4,7 @@
* Hasher Indexer Script * Hasher Indexer Script
* *
* This script reads a text file with one word/phrase per line and indexes * This script reads a text file with one word/phrase per line and indexes
* all the generated hashes into Redis. * all the generated hashes into Elasticsearch.
* *
* Usage: * Usage:
* npx tsx scripts/index-file.ts <path-to-file.txt> [options] * npx tsx scripts/index-file.ts <path-to-file.txt> [options]
@@ -19,16 +19,13 @@
* --help, -h Show this help message * --help, -h Show this help message
*/ */
import Redis from 'ioredis'; import { Client } from '@elastic/elasticsearch';
import { createReadStream, existsSync, readFileSync, writeFileSync, unlinkSync } from 'fs'; import { createReadStream, existsSync, readFileSync, writeFileSync, unlinkSync } from 'fs';
import { resolve, basename } from 'path'; import { resolve, basename } from 'path';
import { createInterface } from 'readline'; import { createInterface } from 'readline';
import crypto from 'crypto'; import crypto from 'crypto';
const REDIS_HOST = process.env.REDIS_HOST || 'localhost'; const ELASTICSEARCH_NODE = process.env.ELASTICSEARCH_NODE || 'http://localhost:9200';
const REDIS_PORT = parseInt(process.env.REDIS_PORT || '6379', 10);
const REDIS_PASSWORD = process.env.REDIS_PASSWORD || undefined;
const REDIS_DB = parseInt(process.env.REDIS_DB || '0', 10);
const INDEX_NAME = 'hasher'; const INDEX_NAME = 'hasher';
const DEFAULT_BATCH_SIZE = 100; const DEFAULT_BATCH_SIZE = 100;
@@ -159,7 +156,7 @@ function deleteState(stateFile: string): void {
} }
} }
async function generateHashes(plaintext: string): Promise<HashDocument> { function generateHashes(plaintext: string): HashDocument {
return { return {
plaintext, plaintext,
md5: crypto.createHash('md5').update(plaintext).digest('hex'), md5: crypto.createHash('md5').update(plaintext).digest('hex'),
@@ -188,10 +185,7 @@ Options:
--help, -h Show this help message --help, -h Show this help message
Environment Variables: Environment Variables:
REDIS_HOST Redis host (default: localhost) ELASTICSEARCH_NODE Elasticsearch node URL (default: http://localhost:9200)
REDIS_PORT Redis port (default: 6379)
REDIS_PASSWORD Redis password (optional)
REDIS_DB Redis database number (default: 0)
Examples: Examples:
npx tsx scripts/index-file.ts wordlist.txt npx tsx scripts/index-file.ts wordlist.txt
@@ -215,14 +209,7 @@ Duplicate Checking:
} }
async function indexFile(filePath: string, batchSize: number, shouldResume: boolean, checkDuplicates: boolean, customStateFile: string | null) { async function indexFile(filePath: string, batchSize: number, shouldResume: boolean, checkDuplicates: boolean, customStateFile: string | null) {
const client = new Redis({ const client = new Client({ node: ELASTICSEARCH_NODE });
host: REDIS_HOST,
port: REDIS_PORT,
password: REDIS_PASSWORD,
db: REDIS_DB,
retryStrategy: (times) => Math.min(times * 50, 2000),
});
const absolutePath = resolve(filePath); const absolutePath = resolve(filePath);
const stateFile = customStateFile || getDefaultStateFile(absolutePath); const stateFile = customStateFile || getDefaultStateFile(absolutePath);
const fileHash = getFileHash(absolutePath); const fileHash = getFileHash(absolutePath);
@@ -260,7 +247,7 @@ async function indexFile(filePath: string, batchSize: number, shouldResume: bool
console.log(`📚 Hasher Indexer`); console.log(`📚 Hasher Indexer`);
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`); console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`Redis: ${REDIS_HOST}:${REDIS_PORT} (DB ${REDIS_DB})`); console.log(`Elasticsearch: ${ELASTICSEARCH_NODE}`);
console.log(`Index: ${INDEX_NAME}`); console.log(`Index: ${INDEX_NAME}`);
console.log(`File: ${filePath}`); console.log(`File: ${filePath}`);
console.log(`Batch size: ${batchSize}`); console.log(`Batch size: ${batchSize}`);
@@ -294,8 +281,8 @@ async function indexFile(filePath: string, batchSize: number, shouldResume: bool
try { try {
// Test connection // Test connection
console.log('🔗 Connecting to Redis...'); console.log('🔗 Connecting to Elasticsearch...');
await client.ping(); await client.cluster.health({});
console.log('✅ Connected successfully\n'); console.log('✅ Connected successfully\n');
// Process file line by line using streams // Process file line by line using streams
@@ -318,90 +305,100 @@ async function indexFile(filePath: string, batchSize: number, shouldResume: bool
if (batch.length === 0) return; if (batch.length === 0) return;
if (isInterrupted) return; if (isInterrupted) return;
// Generate hashes for all items in batch first const bulkOperations: any[] = [];
const batchWithHashes = await Promise.all(
batch.map(async (plaintext: string) => ({
plaintext,
hashes: await generateHashes(plaintext)
}))
);
const pipeline = client.pipeline(); // Generate hashes for all items in batch first
let toIndex: typeof batchWithHashes = []; const batchWithHashes = batch.map((plaintext: string) => ({
plaintext,
hashes: generateHashes(plaintext)
}));
if (checkDuplicates) { if (checkDuplicates) {
// Check which items already exist // Check which items already exist (by plaintext or any hash)
const existenceChecks = await Promise.all( const md5List = batchWithHashes.map((item: any) => item.hashes.md5);
batchWithHashes.map(async (item) => { const sha1List = batchWithHashes.map((item: any) => item.hashes.sha1);
const plaintextExists = await client.exists(`hash:plaintext:${item.plaintext}`); const sha256List = batchWithHashes.map((item: any) => item.hashes.sha256);
if (plaintextExists) return { item, exists: true }; const sha512List = batchWithHashes.map((item: any) => item.hashes.sha512);
// Check if any hash exists
const md5Exists = await client.exists(`hash:index:md5:${item.hashes.md5}`);
const sha1Exists = await client.exists(`hash:index:sha1:${item.hashes.sha1}`);
const sha256Exists = await client.exists(`hash:index:sha256:${item.hashes.sha256}`);
const sha512Exists = await client.exists(`hash:index:sha512:${item.hashes.sha512}`);
return {
item,
exists: md5Exists || sha1Exists || sha256Exists || sha512Exists
};
})
);
for (const check of existenceChecks) { const existingCheck = await client.search({
if (check.exists) { index: INDEX_NAME,
size: batchSize * 5,
query: {
bool: {
should: [
{ terms: { 'plaintext.keyword': batch } },
{ terms: { md5: md5List } },
{ terms: { sha1: sha1List } },
{ terms: { sha256: sha256List } },
{ terms: { sha512: sha512List } },
],
minimum_should_match: 1
}
},
_source: ['plaintext', 'md5', 'sha1', 'sha256', 'sha512']
});
// Create a set of existing hashes for quick lookup
const existingHashes = new Set<string>();
existingCheck.hits.hits.forEach((hit: any) => {
const src = hit._source;
existingHashes.add(src.plaintext);
existingHashes.add(src.md5);
existingHashes.add(src.sha1);
existingHashes.add(src.sha256);
existingHashes.add(src.sha512);
});
// Prepare bulk operations only for items that don't have any duplicate hash
for (const item of batchWithHashes) {
const isDuplicate =
existingHashes.has(item.plaintext) ||
existingHashes.has(item.hashes.md5) ||
existingHashes.has(item.hashes.sha1) ||
existingHashes.has(item.hashes.sha256) ||
existingHashes.has(item.hashes.sha512);
if (!isDuplicate) {
bulkOperations.push({ index: { _index: INDEX_NAME } });
bulkOperations.push(item.hashes);
} else {
state.skipped++; state.skipped++;
sessionSkipped++; sessionSkipped++;
} else {
toIndex.push(check.item);
} }
} }
} else { } else {
// No duplicate checking - index everything // No duplicate checking - index everything
toIndex = batchWithHashes; for (const item of batchWithHashes) {
bulkOperations.push({ index: { _index: INDEX_NAME } });
bulkOperations.push(item.hashes);
}
} }
// Execute bulk operations // Execute bulk operation only if there are new items to insert
if (toIndex.length > 0) { if (bulkOperations.length > 0) {
try { try {
for (const item of toIndex) { const bulkResponse = await client.bulk({
const doc = item.hashes; operations: bulkOperations,
const key = `hash:plaintext:${doc.plaintext}`; refresh: false
});
// Store main document
pipeline.set(key, JSON.stringify(doc));
// Create indexes for each hash type
pipeline.set(`hash:index:md5:${doc.md5}`, doc.plaintext);
pipeline.set(`hash:index:sha1:${doc.sha1}`, doc.plaintext);
pipeline.set(`hash:index:sha256:${doc.sha256}`, doc.plaintext);
pipeline.set(`hash:index:sha512:${doc.sha512}`, doc.plaintext);
// Update statistics
pipeline.hincrby('hash:stats', 'count', 1);
pipeline.hincrby('hash:stats', 'size', JSON.stringify(doc).length);
}
const results = await pipeline.exec(); if (bulkResponse.errors) {
const errorCount = bulkResponse.items.filter((item: any) => item.index?.error).length;
// Count errors
const errorCount = results?.filter(([err]) => err !== null).length || 0;
if (errorCount > 0) {
state.errors += errorCount; state.errors += errorCount;
sessionErrors += errorCount; sessionErrors += errorCount;
const successCount = toIndex.length - errorCount; const successCount = (bulkOperations.length / 2) - errorCount;
state.indexed += successCount; state.indexed += successCount;
sessionIndexed += successCount; sessionIndexed += successCount;
} else { } else {
state.indexed += toIndex.length; const count = bulkOperations.length / 2;
sessionIndexed += toIndex.length; state.indexed += count;
sessionIndexed += count;
} }
} catch (error) { } catch (error) {
console.error(`\n❌ Error processing batch:`, error); console.error(`\n❌ Error processing batch:`, error);
state.errors += toIndex.length; const count = bulkOperations.length / 2;
sessionErrors += toIndex.length; state.errors += count;
sessionErrors += count;
} }
} }
@@ -453,8 +450,9 @@ async function indexFile(filePath: string, batchSize: number, shouldResume: bool
return; return;
} }
// No refresh needed for Redis // Refresh index
console.log('\n\n✅ All data persisted to Redis'); console.log('\n\n🔄 Refreshing index...');
await client.indices.refresh({ index: INDEX_NAME });
// Delete state file on successful completion // Delete state file on successful completion
deleteState(stateFile); deleteState(stateFile);

Ver fichero

@@ -3,7 +3,7 @@
/** /**
* Hasher Duplicate Remover Script * Hasher Duplicate Remover Script
* *
* This script finds and removes duplicate entries from Redis. * This script finds and removes duplicate entries from the Elasticsearch index.
* It identifies duplicates by checking plaintext, md5, sha1, sha256, and sha512 fields. * It identifies duplicates by checking plaintext, md5, sha1, sha256, and sha512 fields.
* *
* Usage: * Usage:
@@ -13,20 +13,20 @@
* Options: * Options:
* --dry-run Show duplicates without removing them (default) * --dry-run Show duplicates without removing them (default)
* --execute Actually remove the duplicates * --execute Actually remove the duplicates
* --batch-size=<number> Number of items to process in each batch (default: 1000)
* --field=<field> Check duplicates only on this field (plaintext, md5, sha1, sha256, sha512) * --field=<field> Check duplicates only on this field (plaintext, md5, sha1, sha256, sha512)
* --help, -h Show this help message * --help, -h Show this help message
*/ */
import Redis from 'ioredis'; import { Client } from '@elastic/elasticsearch';
const REDIS_HOST = process.env.REDIS_HOST || 'localhost'; const ELASTICSEARCH_NODE = process.env.ELASTICSEARCH_NODE || 'http://localhost:9200';
const REDIS_PORT = parseInt(process.env.REDIS_PORT || '6379', 10);
const REDIS_PASSWORD = process.env.REDIS_PASSWORD || undefined;
const REDIS_DB = parseInt(process.env.REDIS_DB || '0', 10);
const INDEX_NAME = 'hasher'; const INDEX_NAME = 'hasher';
const DEFAULT_BATCH_SIZE = 1000;
interface ParsedArgs { interface ParsedArgs {
dryRun: boolean; dryRun: boolean;
batchSize: number;
field: string | null; field: string | null;
showHelp: boolean; showHelp: boolean;
} }
@@ -34,23 +34,15 @@ interface ParsedArgs {
interface DuplicateGroup { interface DuplicateGroup {
value: string; value: string;
field: string; field: string;
plaintexts: string[]; documentIds: string[];
keepPlaintext: string; keepId: string;
deletePlaintexts: string[]; deleteIds: string[];
}
interface HashDocument {
plaintext: string;
md5: string;
sha1: string;
sha256: string;
sha512: string;
created_at: string;
} }
function parseArgs(args: string[]): ParsedArgs { function parseArgs(args: string[]): ParsedArgs {
const result: ParsedArgs = { const result: ParsedArgs = {
dryRun: true, dryRun: true,
batchSize: DEFAULT_BATCH_SIZE,
field: null, field: null,
showHelp: false showHelp: false
}; };
@@ -64,6 +56,21 @@ function parseArgs(args: string[]): ParsedArgs {
result.dryRun = true; result.dryRun = true;
} else if (arg === '--execute') { } else if (arg === '--execute') {
result.dryRun = false; result.dryRun = false;
} else if (arg.startsWith('--batch-size=')) {
const value = arg.split('=')[1];
const parsed = parseInt(value, 10);
if (!isNaN(parsed) && parsed > 0) {
result.batchSize = parsed;
}
} else if (arg === '--batch-size') {
const nextArg = args[i + 1];
if (nextArg && !nextArg.startsWith('-')) {
const parsed = parseInt(nextArg, 10);
if (!isNaN(parsed) && parsed > 0) {
result.batchSize = parsed;
i++;
}
}
} else if (arg.startsWith('--field=')) { } else if (arg.startsWith('--field=')) {
result.field = arg.split('=')[1]; result.field = arg.split('=')[1];
} else if (arg === '--field') { } else if (arg === '--field') {
@@ -89,15 +96,13 @@ Usage:
Options: Options:
--dry-run Show duplicates without removing them (default) --dry-run Show duplicates without removing them (default)
--execute Actually remove the duplicates --execute Actually remove the duplicates
--batch-size=<number> Number of items to process in each batch (default: 1000)
--field=<field> Check duplicates only on this field --field=<field> Check duplicates only on this field
Valid fields: plaintext, md5, sha1, sha256, sha512 Valid fields: plaintext, md5, sha1, sha256, sha512
--help, -h Show this help message --help, -h Show this help message
Environment Variables: Environment Variables:
REDIS_HOST Redis host (default: localhost) ELASTICSEARCH_NODE Elasticsearch node URL (default: http://localhost:9200)
REDIS_PORT Redis port (default: 6379)
REDIS_PASSWORD Redis password (optional)
REDIS_DB Redis database number (default: 0)
Examples: Examples:
npx tsx scripts/remove-duplicates.ts # Dry run, show all duplicates npx tsx scripts/remove-duplicates.ts # Dry run, show all duplicates
@@ -114,137 +119,275 @@ Notes:
} }
async function findDuplicatesForField( async function findDuplicatesForField(
client: Redis, client: Client,
field: string field: string,
batchSize: number
): Promise<DuplicateGroup[]> { ): Promise<DuplicateGroup[]> {
const duplicates: DuplicateGroup[] = []; const duplicates: DuplicateGroup[] = [];
console.log(` Scanning for ${field} duplicates...`); // Use aggregation to find duplicate values
const fieldToAggregate = field === 'plaintext' ? 'plaintext.keyword' : field;
// Get all keys for this field type // Use composite aggregation to handle large number of duplicates
const pattern = field === 'plaintext' let afterKey: any = undefined;
? 'hash:plaintext:*' let hasMore = true;
: `hash:index:${field}:*`;
const keys = await client.keys(pattern); console.log(` Scanning for duplicates...`);
// For hash indexes, group by hash value (not plaintext) while (hasMore) {
const valueMap = new Map<string, string[]>(); const aggQuery: any = {
index: INDEX_NAME,
if (field === 'plaintext') { size: 0,
// Each key is already unique for plaintext aggs: {
// Check for same plaintext with different created_at duplicates: {
for (const key of keys) { composite: {
const plaintext = key.replace('hash:plaintext:', ''); size: batchSize,
if (!valueMap.has(plaintext)) { sources: [
valueMap.set(plaintext, []); { value: { terms: { field: fieldToAggregate } } }
} ],
valueMap.get(plaintext)!.push(plaintext); ...(afterKey && { after: afterKey })
} },
} else { aggs: {
// For hash fields, get the plaintext and check if multiple plaintexts have same hash doc_count_filter: {
for (const key of keys) { bucket_selector: {
const hashValue = key.replace(`hash:index:${field}:`, ''); buckets_path: { count: '_count' },
const plaintext = await client.get(key); script: 'params.count > 1'
}
if (plaintext) { }
if (!valueMap.has(hashValue)) { }
valueMap.set(hashValue, []);
}
valueMap.get(hashValue)!.push(plaintext);
}
}
}
// Find groups with duplicates
for (const [value, plaintexts] of valueMap) {
const uniquePlaintexts = Array.from(new Set(plaintexts));
if (uniquePlaintexts.length > 1) {
// Get documents to compare timestamps
const docs: { plaintext: string; doc: HashDocument }[] = [];
for (const plaintext of uniquePlaintexts) {
const docKey = `hash:plaintext:${plaintext}`;
const docData = await client.get(docKey);
if (docData) {
docs.push({ plaintext, doc: JSON.parse(docData) });
} }
} }
};
// Sort by created_at (oldest first)
docs.sort((a, b) => const response = await client.search(aggQuery);
new Date(a.doc.created_at).getTime() - new Date(b.doc.created_at).getTime() const compositeAgg = response.aggregations?.duplicates as any;
); const buckets = compositeAgg?.buckets || [];
if (docs.length > 1) { for (const bucket of buckets) {
duplicates.push({ if (bucket.doc_count > 1) {
value, const value = bucket.key.value;
field,
plaintexts: docs.map(d => d.plaintext), // Use scroll API for large result sets
keepPlaintext: docs[0].plaintext, const documentIds: string[] = [];
deletePlaintexts: docs.slice(1).map(d => d.plaintext)
let scrollResponse = await client.search({
index: INDEX_NAME,
scroll: '1m',
size: 1000,
query: {
term: {
[fieldToAggregate]: value
}
},
sort: [
{ created_at: { order: 'asc' } }
],
_source: false
}); });
while (scrollResponse.hits.hits.length > 0) {
documentIds.push(...scrollResponse.hits.hits.map((hit: any) => hit._id));
if (!scrollResponse._scroll_id) break;
scrollResponse = await client.scroll({
scroll_id: scrollResponse._scroll_id,
scroll: '1m'
});
}
// Clear scroll
if (scrollResponse._scroll_id) {
await client.clearScroll({ scroll_id: scrollResponse._scroll_id }).catch(() => {});
}
if (documentIds.length > 1) {
duplicates.push({
value: String(value),
field,
documentIds,
keepId: documentIds[0], // Keep the oldest
deleteIds: documentIds.slice(1) // Delete the rest
});
}
} }
} }
// Check if there are more results
afterKey = compositeAgg?.after_key;
hasMore = buckets.length === batchSize && afterKey;
if (hasMore) {
process.stdout.write(`\r Found ${duplicates.length} duplicate groups so far...`);
}
} }
return duplicates; return duplicates;
} }
async function removeDuplicates(parsedArgs: ParsedArgs) { /**
const client = new Redis({ * Phase 1: Initialize and connect to Elasticsearch
host: REDIS_HOST, */
port: REDIS_PORT, async function phase1_InitAndConnect() {
password: REDIS_PASSWORD, console.log(`🔍 Hasher Duplicate Remover - Phase 1: Initialization`);
db: REDIS_DB, console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
}); console.log(`Elasticsearch: ${ELASTICSEARCH_NODE}`);
console.log(`Index: ${INDEX_NAME}`);
console.log('');
const client = new Client({ node: ELASTICSEARCH_NODE });
console.log('🔗 Connecting to Elasticsearch...');
await client.cluster.health({});
console.log('✅ Connected successfully\n');
const countResponse = await client.count({ index: INDEX_NAME });
console.log(`📊 Total documents in index: ${countResponse.count}\n`);
return { client, totalDocuments: countResponse.count };
}
/**
* Phase 2: Find duplicates for a specific field
*/
async function phase2_FindDuplicatesForField(
client: Client,
field: string,
batchSize: number,
seenDeleteIds: Set<string>
): Promise<{ duplicates: DuplicateGroup[], totalFound: number }> {
console.log(`\n🔍 Phase 2: Checking duplicates for field: ${field}...`);
const fieldDuplicates = await findDuplicatesForField(client, field, batchSize);
const duplicates: DuplicateGroup[] = [];
// Filter out already seen delete IDs to avoid counting the same document multiple times
for (const dup of fieldDuplicates) {
const newDeleteIds = dup.deleteIds.filter(id => !seenDeleteIds.has(id));
if (newDeleteIds.length > 0) {
dup.deleteIds = newDeleteIds;
newDeleteIds.forEach(id => seenDeleteIds.add(id));
duplicates.push(dup);
}
}
console.log(` Found ${fieldDuplicates.length} duplicate groups for ${field}`);
console.log(` New unique documents to delete: ${duplicates.reduce((sum, dup) => sum + dup.deleteIds.length, 0)}`);
// Force garbage collection if available
if (global.gc) {
global.gc();
console.log(` ♻️ Memory freed after processing ${field}`);
}
return { duplicates, totalFound: fieldDuplicates.length };
}
/**
* Phase 3: Process deletion for a batch of duplicates
*/
async function phase3_DeleteBatch(
client: Client,
deleteIds: string[],
batchSize: number,
startIndex: number
): Promise<{ deleted: number, errors: number }> {
const batch = deleteIds.slice(startIndex, startIndex + batchSize);
let deleted = 0;
let errors = 0;
try {
const bulkOperations = batch.flatMap(id => [
{ delete: { _index: INDEX_NAME, _id: id } }
]);
const bulkResponse = await client.bulk({
operations: bulkOperations,
refresh: false
});
if (bulkResponse.errors) {
const errorCount = bulkResponse.items.filter((item: any) => item.delete?.error).length;
errors += errorCount;
deleted += batch.length - errorCount;
} else {
deleted += batch.length;
}
} catch (error) {
console.error(`\n❌ Error deleting batch:`, error);
errors += batch.length;
}
// Force garbage collection if available
if (global.gc) {
global.gc();
}
return { deleted, errors };
}
/**
* Phase 4: Finalize and report results
*/
async function phase4_Finalize(
client: Client,
totalDeleted: number,
totalErrors: number,
initialDocumentCount: number
) {
console.log('\n\n🔄 Phase 4: Refreshing index...');
await client.indices.refresh({ index: INDEX_NAME });
const newCountResponse = await client.count({ index: INDEX_NAME });
console.log('\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━');
console.log('✅ Duplicate removal complete!');
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`Documents deleted: ${totalDeleted}`);
console.log(`Errors: ${totalErrors}`);
console.log(`Previous document count: ${initialDocumentCount}`);
console.log(`New document count: ${newCountResponse.count}`);
console.log('');
}
async function removeDuplicates(parsedArgs: ParsedArgs) {
const fields = parsedArgs.field const fields = parsedArgs.field
? [parsedArgs.field] ? [parsedArgs.field]
: ['md5', 'sha1', 'sha256', 'sha512']; : ['plaintext', 'md5', 'sha1', 'sha256', 'sha512'];
console.log(`🔍 Hasher Duplicate Remover`);
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`Redis: ${REDIS_HOST}:${REDIS_PORT} (DB ${REDIS_DB})`);
console.log(`Index: ${INDEX_NAME}`);
console.log(`Mode: ${parsedArgs.dryRun ? '🔎 DRY RUN (no changes)' : '⚠️ EXECUTE (will delete)'}`); console.log(`Mode: ${parsedArgs.dryRun ? '🔎 DRY RUN (no changes)' : '⚠️ EXECUTE (will delete)'}`);
console.log(`Batch size: ${parsedArgs.batchSize}`);
console.log(`Fields to check: ${fields.join(', ')}`); console.log(`Fields to check: ${fields.join(', ')}`);
console.log(''); console.log('');
try { try {
// Test connection // === PHASE 1: Initialize ===
console.log('🔗 Connecting to Redis...'); const { client, totalDocuments } = await phase1_InitAndConnect();
await client.ping();
console.log('✅ Connected successfully\n'); // Force garbage collection after phase 1
if (global.gc) {
// Get index stats global.gc();
const stats = await client.hgetall('hash:stats'); console.log('♻️ Memory freed after initialization\n');
const totalCount = parseInt(stats.count || '0', 10);
console.log(`📊 Total documents in index: ${totalCount}\n`);
const allDuplicates: DuplicateGroup[] = [];
const seenPlaintexts = new Set<string>();
// Find duplicates for each field
for (const field of fields) {
console.log(`🔍 Checking duplicates for field: ${field}...`);
const fieldDuplicates = await findDuplicatesForField(client, field);
// Filter out already seen plaintexts
for (const dup of fieldDuplicates) {
const newDeletePlaintexts = dup.deletePlaintexts.filter(p => !seenPlaintexts.has(p));
if (newDeletePlaintexts.length > 0) {
dup.deletePlaintexts = newDeletePlaintexts;
newDeletePlaintexts.forEach(p => seenPlaintexts.add(p));
allDuplicates.push(dup);
}
}
console.log(` Found ${fieldDuplicates.length} duplicate groups for ${field}`);
} }
const totalToDelete = allDuplicates.reduce((sum, dup) => sum + dup.deletePlaintexts.length, 0); // === PHASE 2: Find duplicates field by field ===
const allDuplicates: DuplicateGroup[] = [];
const seenDeleteIds = new Set<string>();
for (const field of fields) {
const { duplicates } = await phase2_FindDuplicatesForField(
client,
field,
parsedArgs.batchSize,
seenDeleteIds
);
allDuplicates.push(...duplicates);
// Clear field duplicates to free memory
duplicates.length = 0;
}
const totalToDelete = allDuplicates.reduce((sum, dup) => sum + dup.deleteIds.length, 0);
console.log(`\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`); console.log(`\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`📋 Summary:`); console.log(`📋 Summary:`);
@@ -254,7 +397,6 @@ async function removeDuplicates(parsedArgs: ParsedArgs) {
if (allDuplicates.length === 0) { if (allDuplicates.length === 0) {
console.log('✨ No duplicates found! Index is clean.\n'); console.log('✨ No duplicates found! Index is clean.\n');
await client.quit();
return; return;
} }
@@ -267,8 +409,8 @@ async function removeDuplicates(parsedArgs: ParsedArgs) {
: dup.value; : dup.value;
console.log(` Field: ${dup.field}`); console.log(` Field: ${dup.field}`);
console.log(` Value: ${truncatedValue}`); console.log(` Value: ${truncatedValue}`);
console.log(` Keep: ${dup.keepPlaintext}`); console.log(` Keep: ${dup.keepId}`);
console.log(` Delete: ${dup.deletePlaintexts.length} document(s)`); console.log(` Delete: ${dup.deleteIds.length} document(s)`);
console.log(''); console.log('');
} }
@@ -281,70 +423,44 @@ async function removeDuplicates(parsedArgs: ParsedArgs) {
console.log(`🔎 DRY RUN - No changes made`); console.log(`🔎 DRY RUN - No changes made`);
console.log(` Run with --execute to remove ${totalToDelete} duplicate documents`); console.log(` Run with --execute to remove ${totalToDelete} duplicate documents`);
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n`); console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n`);
await client.quit();
return; return;
} }
// Execute deletion // === PHASE 3: Execute deletion in batches ===
console.log(`\n🗑 Removing ${totalToDelete} duplicate documents...\n`); console.log(`\n🗑 Phase 3: Removing ${totalToDelete} duplicate documents...\n`);
let deleted = 0; let totalDeleted = 0;
let errors = 0; let totalErrors = 0;
const deleteIds = allDuplicates.flatMap(dup => dup.deleteIds);
for (const dup of allDuplicates) { // Clear allDuplicates to free memory
for (const plaintext of dup.deletePlaintexts) { allDuplicates.length = 0;
try {
const docKey = `hash:plaintext:${plaintext}`; // Delete in batches with memory management
const docData = await client.get(docKey); for (let i = 0; i < deleteIds.length; i += parsedArgs.batchSize) {
const { deleted, errors } = await phase3_DeleteBatch(
if (docData) { client,
const doc: HashDocument = JSON.parse(docData); deleteIds,
const pipeline = client.pipeline(); parsedArgs.batchSize,
i
// Delete main document );
pipeline.del(docKey);
totalDeleted += deleted;
// Delete all indexes totalErrors += errors;
pipeline.del(`hash:index:md5:${doc.md5}`);
pipeline.del(`hash:index:sha1:${doc.sha1}`); process.stdout.write(
pipeline.del(`hash:index:sha256:${doc.sha256}`); `\r⏳ Progress: ${Math.min(i + parsedArgs.batchSize, deleteIds.length)}/${deleteIds.length} - ` +
pipeline.del(`hash:index:sha512:${doc.sha512}`); `Deleted: ${totalDeleted}, Errors: ${totalErrors}`
);
// Update statistics
pipeline.hincrby('hash:stats', 'count', -1);
pipeline.hincrby('hash:stats', 'size', -JSON.stringify(doc).length);
const results = await pipeline.exec();
if (results && results.some(([err]) => err !== null)) {
errors++;
} else {
deleted++;
}
}
process.stdout.write(`\r⏳ Progress: ${deleted + errors}/${totalToDelete} - Deleted: ${deleted}, Errors: ${errors}`);
} catch (error) {
console.error(`\n❌ Error deleting ${plaintext}:`, error);
errors++;
}
}
} }
// Get new count // Clear deleteIds to free memory
const newStats = await client.hgetall('hash:stats'); deleteIds.length = 0;
const newCount = parseInt(newStats.count || '0', 10); seenDeleteIds.clear();
console.log('\n\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'); // === PHASE 4: Finalize ===
console.log('✅ Duplicate removal complete!'); await phase4_Finalize(client, totalDeleted, totalErrors, totalDocuments);
console.log(`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
console.log(`Documents deleted: ${deleted}`);
console.log(`Errors: ${errors}`);
console.log(`Previous document count: ${totalCount}`);
console.log(`New document count: ${newCount}`);
console.log('');
await client.quit();
} catch (error) { } catch (error) {
console.error('\n❌ Error:', error instanceof Error ? error.message : error); console.error('\n❌ Error:', error instanceof Error ? error.message : error);
process.exit(1); process.exit(1);
@@ -369,10 +485,11 @@ if (parsedArgs.field && !validFields.includes(parsedArgs.field)) {
console.log(`\n🔧 Configuration:`); console.log(`\n🔧 Configuration:`);
console.log(` Mode: ${parsedArgs.dryRun ? 'dry-run' : 'execute'}`); console.log(` Mode: ${parsedArgs.dryRun ? 'dry-run' : 'execute'}`);
console.log(` Batch size: ${parsedArgs.batchSize}`);
if (parsedArgs.field) { if (parsedArgs.field) {
console.log(` Field: ${parsedArgs.field}`); console.log(` Field: ${parsedArgs.field}`);
} else { } else {
console.log(` Fields: all (md5, sha1, sha256, sha512)`); console.log(` Fields: all (plaintext, md5, sha1, sha256, sha512)`);
} }
console.log(''); console.log('');