lumin 0.1.9

A library for searching and displaying local files
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
# Development Log

## Purpose of this Document

This development log (devlog.md) serves as a comprehensive record of design decisions, implementation details, challenges faced, and solutions applied during the development of the lumin utility. 

This document should be updated whenever significant changes are made to the codebase. It acts as both documentation and knowledge transfer for future developers or AI agents working on this project.

The log is organized into sections:
- Core Features: Descriptions of main functionality modules
- Technical Considerations: Important technical design choices 
- Challenges and Solutions: Specific problems encountered and how they were solved
- Recent Changes: A chronological record of recent major updates
- Future Work: Planned enhancements and improvements

When asked to "update devlog.md", AI agents should:
1. Examine recent code changes and identify significant modifications
2. Document new features, refactorings, or significant bug fixes
3. Update relevant sections of this document
4. Add a new entry to the "Recent Changes" section with a timestamp
5. If applicable, update the "Future Work" section with new ideas

## Implementation Summary

This document describes the implementation of the lumin utility, focusing on design decisions, challenges encountered, and solutions applied.

## Core Features

### Search Functionality (`search/mod.rs`)
- **Description**: Searches for patterns in files using the `grep` crate
- **Key components**:
  - `SearchOptions`: Controls case sensitivity, gitignore respect, and context lines (before/after matches)
  - `SearchResult`: Contains matched file path, line number, content, and context indicators
  - `search_files()`: Main search function
  - `collect_files()`: Helper to gather files respecting gitignore settings

### Traverse Functionality (`traverse/mod.rs`)
- **Description**: Traverses directories and lists files using the `ignore` crate
- **Key components**:
  - `TraverseOptions`: Controls case sensitivity, gitignore respect, text-only filtering, and pattern matching
  - `TraverseResult`: Contains file path and file type
  - `is_hidden()`: Detects hidden files and files in hidden directories
  - `traverse_directory()`: Main directory traversal function
- **Pattern matching**:
  - Supports glob patterns (wildcards, character classes, brace expansion)
  - Supports substring patterns for simpler searches
  - Automatically selects appropriate matching strategy
  - Respects case sensitivity settings

### View Functionality (`view/mod.rs`)
- **Description**: Displays file contents with metadata using the `infer` crate
- **Key components**:
  - `ViewOptions`: Controls size limits
  - `FileView`: Structured output with file path, type, and contents
  - `FileContents`: Enum with variants for different types of content (text, binary, image)
  - `TextMetadata`, `BinaryMetadata`, `ImageMetadata`: Specialized metadata structures
  - `view_file()`: Main function for viewing files

### CLI Interface (`main.rs`)
- **Description**: Command-line interface using the `clap` crate
- **Key components**:
  - `Cli`: Main CLI structure with subcommands
  - `Commands`: Enum of available commands (search, traverse, view)
  - Command-specific option handling

## Technical Considerations

### Error Handling
- Using `anyhow` crate for comprehensive error handling
- Context-rich error messages with `with_context` and `context` methods
- Proper propagation of errors using the `?` operator
- Early returns for validation errors

### File Type Detection
- Multi-layered approach to file type detection:
  1. Extension-based detection for common file types
  2. Content-based detection using the `infer` crate
  3. Heuristic text/binary detection as fallback
- Special handling for different file categories (text, image, binary)

### Gitignore Handling
- Comprehensive gitignore handling using the `ignore` crate
- Special attention to hidden files and directories
- Options to respect or ignore gitignore rules
- Proper handling of the `.hidden` builder flag (critical for test passing)

### Structured Type System
- Strong typing using enums and structs for different file content types
- Type-safe serialization using serde
- Clear separation of concerns between different file types (text, binary, image)

### Testing
- Comprehensive test suite with serial execution using `serial_test`
- Test setup/teardown using `TestEnvironment` with RAII pattern
- Tests for all edge cases:
  - Case sensitivity
  - Gitignore respect/ignore
  - Hidden files
  - Binary files
  - Size limits

## Challenges and Solutions

### Gitignore Handling
- **Challenge**: Proper handling of gitignore files, especially when toggling respect_gitignore flag
- **Solution**: 
  ```rust
  // When not respecting gitignore, explicitly include hidden files and dirs
  builder.hidden(options.respect_gitignore);
  // Additional settings to ensure we fully respect/ignore gitignore as needed
  if !options.respect_gitignore {
      builder.ignore(false); // Turn off all ignore logic
      builder.git_exclude(false); // Don't use git exclude files
      builder.git_global(false); // Don't use global git ignore
  }
  ```

### Hidden File Detection
- **Challenge**: Properly identifying files in hidden directories
- **Solution**: Enhanced the `is_hidden()` method to check both file name and path components:
  ```rust
  pub fn is_hidden(&self) -> bool {
      // Check if the file name starts with a dot
      let file_is_hidden = self.file_path
          .file_name()
          .and_then(|n| n.to_str())
          .is_some_and(|name| name.starts_with("."));
          
      // Also check if the file is in a hidden directory
      let path_contains_hidden_dir = self.file_path
          .to_string_lossy()
          .split('/')
          .any(|part| part.starts_with(".") && !part.is_empty());
          
      file_is_hidden || path_contains_hidden_dir
  }
  ```

### File Type Detection
- **Challenge**: Reliable detection of file types, especially distinguishing text and binary
- **Solution**: Multi-layered approach combining extension hints and content analysis:
  ```rust
  // First try extension-based detection
  let extension_type = path
      .extension()
      .and_then(|ext| ext.to_str())
      .map(|ext| match ext.to_lowercase().as_str() {
          "txt" | "md" | "rs" | "toml" => Some("text/plain"),
          // ... more extensions ...
          _ => None
      })
      .unwrap_or(None);
      
  // Then try content-based detection with fallbacks
  let file_type = match infer.get_from_path(path) {
      Ok(Some(kind)) => kind.mime_type().to_string(),
      Ok(None) => {
          if let Some(ext_type) = extension_type {
              ext_type.to_string()
          } else {
              // Heuristic analysis as last resort
              // ...
          }
      },
      Err(e) => return Err(anyhow!("Failed to determine file type: {}", e)),
  };
  ```

### Type-Safe Output Structure
- **Challenge**: Creating a type-safe yet flexible structure for file contents
- **Solution**: Replaced dynamic JSON with a strongly typed enum structure:
  ```rust
  #[derive(Serialize, Deserialize, Debug, Clone)]
  #[serde(tag = "type")]
  pub enum FileContents {
      #[serde(rename = "text")]
      Text {
          content: String,
          metadata: TextMetadata,
      },
      #[serde(rename = "binary")]
      Binary {
          message: String,
          metadata: BinaryMetadata,
      },
      #[serde(rename = "image")]
      Image {
          message: String,
          metadata: ImageMetadata,
      },
  }
  
  #[derive(Serialize, Deserialize, Debug, Clone)]
  pub struct TextMetadata {
      pub line_count: usize,
      pub char_count: usize,
  }
  
  #[derive(Serialize, Deserialize, Debug, Clone)]
  pub struct BinaryMetadata {
      pub binary: bool,
      pub size_bytes: u64,
      pub mime_type: Option<String>,
  }
  ```

### Test Pattern Matching
- **Challenge**: Making tests work with enum-based content types instead of direct JSON access
- **Solution**: Updated tests to use Rust's pattern matching:
  ```rust
  match &result.contents {
      FileContents::Text { content, metadata } => {
          assert!(!content.is_empty());
          assert!(content.contains("Configuration file for testing"));
          assert!(metadata.line_count > 0);
          assert!(metadata.char_count > 0);
      },
      _ => panic!("Expected text content, got a different variant"),
  }
  ```

## Recent Changes

### 2025-05-21: Added Before-Context Feature to Search Functionality
- Implemented before-context functionality similar to grep's -B option
- Changes:
  - Added `before_context` field to `SearchOptions` struct
  - Updated `SearcherBuilder` configuration to use grep's built-in before-context functionality
  - Enhanced test suite with specific before-context test cases and edge case handling
  - Added CLI option with `-B` shorthand for specifying before-context lines
  - Created comprehensive tests for both individual and combined context usage
- Benefits:
  - Complete context viewing around matches (both before and after)
  - Improved code understanding by seeing declarations before implementations
  - Full compatibility with grep-like behavior (-B and -A options work together)
  - Edge cases properly handled (matches on first/last lines)
- Implementation details:
  - Integrated with grep's native context support
  - Preserved existing after-context functionality
  - Combined context properly merges when matches are close to each other
  - Added test cases for edge scenarios like matches on first/last lines

### 2025-05-10: Added After-Context Feature to Search Functionality
- Implemented after-context functionality similar to grep's -A option
- Changes:
  - Added `after_context` field to `SearchOptions` struct
  - Added `is_context` field to `SearchResult` struct to differentiate matches from context lines
  - Updated `SearcherBuilder` configuration to use grep's built-in after-context functionality
  - Implemented a custom sink for handling both matches and context lines
  - Added CLI option with `-A` shorthand for specifying after-context lines
  - Added visual distinction between match lines and context lines in output
- Benefits:
  - Better understanding of code surrounding matches
  - Improved usability for viewing function definitions and their implementations
  - Compatibility with familiar grep-like behavior
  - Clear distinction between matches and their context
- Implementation details:
  - Integrated with grep's native context support
  - Preserved content omission feature when applied to match lines
  - Context lines are displayed differently in the output (with "-" instead of ":")
  - Added separators between non-contiguous results

### 2025-05-05: Enhanced Pattern Matching Documentation and Tests
- Expanded documentation and tests for the traverse module's pattern matching functionality
- Changes:
  - Added comprehensive unit tests for glob pattern matching with all supported features
  - Improved Rustdoc documentation with detailed explanations and examples
  - Structured documentation by pattern types (wildcards, character classes, etc.)
  - Added resilient testing for various filesystem environments
- Benefits:
  - Better understanding of pattern matching capabilities for users
  - More thorough test coverage for all pattern matching edge cases
  - Clear examples of how to use glob patterns effectively
  - Improved reliability of tests across different environments
- Implementation details:
  - Tested pattern features including:
    - Basic wildcards (`*` and `?`)
    - Recursive directory matching (`**`)
    - Character classes (`[a-z]`, `[!0-9]`, etc.)
    - Brace expansion (`{txt,md}`)
    - Complex combinations of multiple pattern features
    - Case-sensitive and case-insensitive matching
    - Special characters and boundary conditions
  - Organized documentation to highlight both simple and advanced use cases
  - Made tests more resilient to filesystem limitations and environment differences

### 2025-05-05: Simplified Logging System to Use env_logger
- Removed tracing dependencies in favor of a simpler logging solution
- Changes:
  - Updated telemetry module to use env_logger exclusively
  - Simplified logging setup to avoid initialization issues
  - Maintained structured logging capabilities with stderr output
  - Removed unused tracing imports across modules
- Benefits:
  - More stable logging implementation with fewer dependencies
  - Same capabilities for structured logging and console output
  - Better compatibility with Rust ecosystem
  - Consistent log format using env_logger for readability
- Implementation details:
  - Used env_logger for console output and configuration
  - Maintained compatibility with existing log levels and log crate
  - Applied centralized logging configuration for consistent behavior
  - Used Once guard to ensure single initialization

### 2025-04-23: Enhanced Pattern Matching in Traverse Command
- Added both glob and substring pattern matching to filter files in the traverse command
- Changes:
  - Added `pattern` field to `TraverseOptions` struct
  - Modified `traverse_directory` function to filter files based on pattern
  - Added multi-mode pattern matching using both `globset` and `regex` crates
  - Updated CLI to accept optional pattern parameter
  - Added comprehensive tests for pattern matching (both glob and substring)
- Benefits:
  - More powerful and flexible filtering capabilities for traverse command
  - Support for glob patterns like "**/*.rs" to find specific file types
  - Support for simple substring matching (e.g., searching for "README")
  - Case-sensitive/insensitive pattern matching respecting existing options
- Implementation details:
  - Intelligently selects pattern matching mode based on input
  - Uses `GlobBuilder` with case sensitivity for glob patterns
  - Uses `Regex` with case insensitivity flags for substring patterns
  - Checks for glob special characters to determine matching strategy
  - Applied pattern matching to respect user's intent for both modes

### 2025-05-01: Added Line-Based Filtering to View Command
- Added ability to filter file content by line number range in the view command
- Changes:
  - Added `line_from` and `line_to` fields to `ViewOptions` struct
  - Created new `TextContent` and `LineContent` structs to represent line-by-line file content
  - Modified `view_file` function to filter text content based on specified line range
  - Changed output format from JSON to `{filepath}:{line_number}:{content}`
  - Added helper methods to `TextContent` for compatibility with existing tests
  - Updated CLI to accept optional line range parameters
- Benefits:
  - Ability to view only specific portions of text files
  - Graceful handling of out-of-range line specifications
  - More efficient viewing of large files by focusing on relevant sections
  - Output format consistent with project style
- Implementation details:
  - Line numbers are 1-based (first line is 1)
  - If a specified line range is invalid or out of bounds, returns empty or partial content instead of errors
  - Metadata still reflects entire file even when filtering is applied
  - Added comprehensive tests for line filtering behavior

### 2025-04-23: Removed Binary Filtering from Tree Module
- Simplified the tree functionality by removing binary file filtering
- Changes:
  - Removed `only_text_files` option from `TreeOptions`
  - Removed `include_binary` parameter from CLI tree command
  - Removed `is_text_file` helper function
  - Simplified tree generation logic
- Benefits:
  - More consistent behavior with all files included in tree output
  - Less complexity in the tree generation code
  - Faster tree generation without file type detection overhead

### 2025-04-23: Added Directory Tree Structure Functionality
- Implemented a new `tree` module for displaying directory structures hierarchically
- Features:
  - Structured JSON output with directories and their contents
  - Support for filtering options: gitignore respect and case sensitivity
  - Clear organization of files and directories in a hierarchical layout
- Implementation details:
  - Refactored common code from traverse into a shared `common` module
  - Created specialized traversal algorithm that builds the tree structure
  - Developed proper handling for nested directories
  - Added comprehensive tests for all features
- Technical decisions:
  - Used HashMap to build the tree efficiently before final serialization
  - Created consistent entry types through a tagged enum structure

### 2025-04-23: Refactored Traverse Module
- Extracted common functionality from traverse into a shared `common` module
- Benefits:
  - Reduced code duplication between traverse and tree modules
  - Better separation of concerns
  - Improved maintainability
  - More consistent behavior across similar operations
- Updates:
  - Enhanced file type detection
  - Improved filtering mechanisms

### Added Generic Traversal Function with Callback Support
- Created a new generic `traverse_with_callback<T, F>` function
- Benefits:
  - Supports collecting arbitrary result types via callbacks
  - Uses `try_fold` for more efficient traversal and error handling
  - Enables processing files during traversal without creating intermediate collections
  - Creates a foundation for more specialized traversal functions
- Implementation details:
  - Takes a callback function `F: FnMut(T, &Path) -> Result<T>`
  - Returns accumulated result of type `T`
  - Refactored `collect_files_with_excludes` to use this generic function
  - Refactored search module's `collect_files` to use the generic function directly
  - Added comprehensive documentation with usage examples
- Improved code sharing between search and traverse modules
  - Updated tests to ensure consistency

### 2025-04-23: Refactored View Module to Use Type-Safe Structures
- Replaced dynamic JSON construction with strongly typed enum-based structure
- Created dedicated types for different content categories:
  - `FileContents` enum with Text, Binary, and Image variants
  - Specialized metadata structs for each content type
- Benefits:
  - More type safety and better compile-time checking
  - Clearer separation of concerns
  - Better serialization control with serde attributes
  - Improved pattern matching in tests
- Updated all tests to use pattern matching on the enum variants

### 2025-04-23: Enhanced Gitignore Handling and File Detection
- Fixed issues with gitignore handling in both search and traverse modules
- Improved detection of hidden files and directories
- Enhanced file type detection with better extension and content analysis
- Added special handling for different file categories (text, image, binary)

## Future Work

### Performance Improvements
- Consider adding parallelism for faster searches in large codebases
- Implement caching for frequently accessed directories
- Add progress indicators for long-running operations
- Optimize pattern matching for large directory structures

### Feature Enhancements
- Add support for other VCS ignore files (.hgignore, .svnignore)
- Implement find/replace functionality
- Add interactive mode for viewing large files
- Support for archive traversal (zip, tar)
- Add pattern matching for tree command (similar to traverse command)
- Support for negative pattern matching (exclude patterns)
- Advanced pattern filtering with custom pattern combinations

### Output Formats
- Add additional output formats (CSV, plain text)
- Support for colorized output
- Code-aware output formatting based on file type
- Pattern-aware result highlighting

### Code Quality
- Continue improving error messages
- Add more detailed documentation
- Consider breaking large modules into smaller, more focused ones
- Further enhance test resilience across different environments