On an Arduino MEGA (or an UNO with the ATmega328P replaced with an ATmega328PB, but then you'd have to adapt the code slightly), this can be done via the output compare modulator:
void setup() {
// CTC, OC0A toggle on compare match, prescale 1/64
TCCR0A = _BV(COM0A0)|_BV(WGM01)|_BV(CS01)|_BV(CS00);
TCCR0B = 0;
// 1ms at 16 MHz and 1/64 prescale
OCR0A = 249;
// fast PWM, TOP is OCR1A, OC1C clear on compare match and set at BOTTOM, no prescale
TCCR1A = _BV(COM1C1)|_BV(WGM11)|_BV(WGM10);
TCCR1B = _BV(WGM13)|_BV(WGM12)|_BV(CS10);
// 71.4375μs at 16 MHz and no prescale
OCR1A = 1142;
// 35.6875μs at 16 MHz and no prescale
OCR1C = 570;
// no interrupts from the timers
TIMSK0 = 0;
TIMSK1 = 0;
// only output a 1 when both timers are outputting a 1
PORTB &= ~_BV(PORTB7);
// halt the timers and reset the prescaler
GTCCR |= _BV(TSM);
GTCCR |= _BV(PSRSYNC);
// reset both timers
TCNT0 = 0;
TCNT1 = 0;
// enable output
DDRB |= _BV(DDB7);
// let the timers run
GTCCR &= ~_BV(TSM);
}
void loop() {
// We don't have to do anything here! Hooray!
}
With that sketch, pin 13 on the MEGA will have the desired waveform. There's two big advantages to this method. First, it doesn't require any CPU resources at all (no loops or interrupts) once it's set up, so you can do whatever else you want to do in loop without thinking about it, or nothing at all if this is all this Arduino is responsible for. Second, it's as accurate as it's possible to get with a 16MHz oscillator (the 500Hz component is perfect, and the 14kHz component is approximately 13998Hz with a duty cycle of 49.956%). The one disadvantage, though, is that millis and everything else in the Arduino library that uses timers internally won't work quite right, since we're using the same hardware timers that they depend on.