�
��h�! � � � d Z ddlZddlZddlmZmZ ddlZddlm Z ddl
mZ ddlm
Z
mZ dedeeef fd �Zd,d
ededed
efd�Zdedededefd�Zdedefd�Zdededededee f
d�Zdedefd�Zdededededededeee ee f fd �Zd!ed"ed#ee d$ee def
d%�Zd&� Zdeded'ed(edz ded)ededefd*�Zed+k( r e� yy)-a�
Breakpoint-first mosaic using a precomputed UCSC chain (minimap2 + transanno already run),
with optional on-the-fly generation if no chain is provided.
Pipeline:
1) (Optional) If --chain not provided:
minimap2 (A vs B) -> PAF
transanno minimap2-to-chain -> CHAIN
2) Load CHAIN with the 'liftover' Python library to get an A->B converter.
3) Simulate N breakpoints on A with a minimum spacing.
4) Lift each A breakpoint to B (keep only those mapping to B's main chrom on '+'
strand; require strictly increasing on B).
5) Write paired A/B breakpoints to TSV.
6) Build mosaic by alternating A and B between consecutive breakpoints, starting on A.
Requirements:
- Python: liftover, pyfaidx, numpy
- If --chain is NOT supplied: minimap2 and transanno must be in PATH.
Example:
python build_mosaic.py --faA A.fa --faB B.fa --out-prefix results/run1 --chain precomputed.chain --n 4 --min-distance 1000000 --seed 13
� N)�List�Tuple)� ChainFile)�Fasta)�
check_tool�run�fa_path�returnc �� � t | � }|j � st d| � �� �t |j � � d }|t || dd j
� j
� fS )z2
Read the first record from a FASTA file.
zNo records in r N)r �keys�
ValueError�list�str�seq�upper)r �fa�names �9/Users/bao/code/klassify/scripts/simulate/build_mosaic.py�read_first_recordr ) sa � �
�w��B�
�7�7�9��>�'��3�4�4����� �?�1��D���R��X�a�[�_�_�%�+�+�-�-�-� �path�headerr �widthc �� � t | dd�� 5 }|j d|� d�� t dt |� |� D ] }|j ||||z dz � � ddd� y# 1 sw Y yxY w)z
Write a FASTA file.
�w�utf-8��encoding�>�
r N)�open�write�range�len)r r r r �fw�is r �write_fastar'